SP Transit Node Classifier

Classifies bus stops in São Paulo's transit network as Hub, Intermediate, or Peripheral based on graph features and geographic coordinates.

The goal: predict betweenness centrality class without computing betweenness itself (which is computationally expensive for large networks).

How to Use

import joblib
import numpy as np
from huggingface_hub import hf_hub_download

path = hf_hub_download(
    repo_id="cintia-shinoda/sp-transit-node-classifier",
    filename="model.joblib",
)
model = joblib.load(path)

# Input: [degree, degree_centrality, closeness_centrality, lat, lon]
node = np.array([[8, 0.00036, 0.018, -23.55, -46.63]])
pred = model.predict(node)
# 0 = Peripheral, 1 = Intermediate, 2 = Hub

Features

Feature	Description
degree	Number of direct connections
degree_centrality	Normalized degree centrality
closeness_centrality	Closeness centrality
lat	Latitude
lon	Longitude

Metrics

Metric	Value
F1 Macro (test)	0.59
Accuracy (test)	0.68
F1 Macro (5-fold CV)	0.43

Feature Importance

Feature	Importance
lat	0.2793
lon	0.2604
closeness_centrality	0.2566
degree	0.1061
degree_centrality	0.0976

Key Finding

Geographic position (lat/lon) is the strongest predictor of hub status, confirming that high-centrality stops concentrate in specific corridors of São Paulo.

Limitations

Labels derived from betweenness centrality quantiles — simplified classification
Trained on a single GTFS snapshot — may not generalize to network changes
Does not consider temporal patterns (peak vs. off-peak)
Class imbalance: 66% Peripheral, 24% Intermediate, 10% Hub

Dataset

SP Transit Network Centrality — 21,892 bus stops with graph centrality metrics.

Citation

@misc{shinoda2026sp-classifier,
  author = {Cintia Shinoda},
  title = {SP Transit Node Classifier},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/cintia-shinoda/sp-transit-node-classifier}
}

Downloads last month: -

cintia-shinoda
/

sp-transit-node-classifier