SP Transit Node Classifier
Classifies bus stops in São Paulo's transit network as Hub, Intermediate, or Peripheral based on graph features and geographic coordinates.
The goal: predict betweenness centrality class without computing betweenness itself (which is computationally expensive for large networks).
How to Use
import joblib
import numpy as np
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id="cintia-shinoda/sp-transit-node-classifier",
filename="model.joblib",
)
model = joblib.load(path)
# Input: [degree, degree_centrality, closeness_centrality, lat, lon]
node = np.array([[8, 0.00036, 0.018, -23.55, -46.63]])
pred = model.predict(node)
# 0 = Peripheral, 1 = Intermediate, 2 = Hub
Features
| Feature | Description |
|---|---|
| degree | Number of direct connections |
| degree_centrality | Normalized degree centrality |
| closeness_centrality | Closeness centrality |
| lat | Latitude |
| lon | Longitude |
Metrics
| Metric | Value |
|---|---|
| F1 Macro (test) | 0.59 |
| Accuracy (test) | 0.68 |
| F1 Macro (5-fold CV) | 0.43 |
Feature Importance
| Feature | Importance |
|---|---|
| lat | 0.2793 |
| lon | 0.2604 |
| closeness_centrality | 0.2566 |
| degree | 0.1061 |
| degree_centrality | 0.0976 |
Key Finding
Geographic position (lat/lon) is the strongest predictor of hub status, confirming that high-centrality stops concentrate in specific corridors of São Paulo.
Limitations
- Labels derived from betweenness centrality quantiles — simplified classification
- Trained on a single GTFS snapshot — may not generalize to network changes
- Does not consider temporal patterns (peak vs. off-peak)
- Class imbalance: 66% Peripheral, 24% Intermediate, 10% Hub
Dataset
SP Transit Network Centrality — 21,892 bus stops with graph centrality metrics.
Citation
@misc{shinoda2026sp-classifier,
author = {Cintia Shinoda},
title = {SP Transit Node Classifier},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/cintia-shinoda/sp-transit-node-classifier}
}
- Downloads last month
- -