--- license: cc-by-nc-4.0 datasets: - saifkhichi96/spinetrack - dfki-av/simspine base_model: - Tau-J/RTMPose base_model_relation: merge tags: - 2d-human-pose-estimation - computer-vision - keypoint-detection - spinepose - spinetrack language: - en --- # Model Card for **SpinePose** Family **SpinePose** is a family of 2D human pose estimation models trained to estimate a **37-keypoint skeleton**, extending standard human body models to include the **spine**, **pelvis**, and **feet** regions in detail. Four SpinePose variants (small, medium, large, and x-large) are available, with 0.72, 1.98, 4.22, and 17.37 GFLOPS respectively at inference time. V1 models were trained on the [SpineTrack dataset](https://huggingface.co/datasets/saifkhichi96/spinetrack) and V2 models were fine-tuned on the [SIMSPINE dataset](https://huggingface.co/datasets/dfki-av/simspine), published in our [CVSports @ CVPR 2025](https://huggingface.co/papers/2504.08110) and [CVPR 2026](https://huggingface.co/papers/2602.20792) papers, respectively. --- ## Model Details ### Description - **Developed by:** [Muhammad Saif Ullah Khan](https://saifkhichi.com/) - **Affiliation:** Technical University of Kaiserslautern & [DFKI](https://av.dfki.de/) - **Funding:** DFKI GmbH - **Model Type:** Top-down 2D keypoint estimator - **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/) - **Frameworks:** PyTorch, ONNX Runtime - **Input Resolution:** 256×192 or 384×288 (depending on variant) ### Sources - **Repository:** [github.com/dfki-av/spinepose](https://github.com/dfki-av/spinepose) - **Paper:** [CVPR Workshops 2025 (CVSPORTS)](https://openaccess.thecvf.com/content/CVPR2025W/CVSPORTS/html/Khan_Towards_Unconstrained_2D_Pose_Estimation_of_the_Human_Spine_CVPRW_2025_paper.html) - **Demo:** [saifkhichi.com/research/spinepose](https://www.saifkhichi.com/research/spinepose/) --- ## Intended Uses ### Direct Use - Human body and spine joint localization from RGB images or videos - Real-time motion analysis for research, animation, or sports applications - Augmentation of general-purpose pose estimators for anatomically rich tasks ### Downstream Use - Integration with clinical posture tracking systems - 3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset) - Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga) ### Out-of-Scope Use - Any medical diagnosis or treatment application without human oversight - Full-body 3D reconstruction (requires separate lifting model) - Unverified use in safety-critical systems --- ## Bias, Risks, and Limitations - Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses. - Limited diversity in body types and cultural attire representation. - Bias inherited from COCO/Body8 datasets used for pretraining the teachers. ### Recommendations Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias. --- ## Getting Started ### Installation ```bash pip install spinepose ``` On Linux/Windows with CUDA available, install the GPU version: ```bash pip install spinepose[gpu] ``` ### CLI Usage ```bash spinepose -i /path/to/image_or_video -o /path/to/output ``` This automatically downloads the correct ONNX checkpoint. Run `spinepose -h` for detailed usage options. ### Python API ```python import cv2 from spinepose import SpinePoseEstimator # Initialize estimator (downloads ONNX model if not found locally) estimator = SpinePoseEstimator(device='cuda') # Perform inference on a single image image = cv2.imread('path/to/image.jpg') keypoints, scores = estimator.predict(image) visualized = estimator.visualize(image, keypoints, scores) cv2.imwrite('output.jpg', visualized) ``` For higher-level use: ```python from spinepose.inference import infer_image, infer_video # Single image inference infer_image('path/to/image.jpg', 'output.jpg') # Video inference with optional temporal smoothing infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True) ``` ## Evaluation To reproduce results, prepare the following directory layout: ```plaintext / ├─ data/ │ ├─ spinetrack/ │ ├─ coco/ │ └─ halpe/ └─ checkpoints/ ├─ spinepose-s_32xb256-10e_spinetrack-256x192.pth ├─ spinepose-m_32xb256-10e_spinetrack-256x192.pth ├─ spinepose-l_32xb256-10e_spinetrack-256x192.pth └─ spinepose-x_32xb128-10e_spinetrack-384x288.pth ``` Each PyTorch checkpoint contains both `teacher` and `student` weights, with only the `student` used during inference. Exported ONNX checkpoints only contain the `student`. ### Metrics We report **Average Precision (AP)** and **Average Recall (AR)** under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format. ### Results
Method Train Data Kpts COCO Halpe26 Body Feet Spine Overall Params (M) FLOPs (G)
APAR APAR APAR APAR APAR APAR
SimCC-MBV2COCO1762.067.833.243.972.175.60.00.00.00.00.10.12.290.31
RTMPose-tBody82665.971.368.073.276.980.074.179.70.00.015.817.93.510.37
RTMPose-sBody82669.774.772.076.780.983.678.983.50.00.017.219.45.700.70
SpinePose-sSpineTrack3768.273.170.675.279.182.177.582.989.690.784.286.25.980.72
SimCC-ViPNASCOCO1769.575.536.949.779.683.00.00.00.00.00.20.28.650.80
RTMPose-mBody82675.180.076.781.385.587.984.188.20.00.019.421.413.931.95
SpinePose-mSpineTrack3773.077.575.079.284.086.483.587.491.492.588.089.514.341.98
RTMPose-lBody82676.981.578.482.986.889.286.990.00.00.020.022.028.114.19
RTMW-mCocktail1413373.878.763.868.584.386.783.087.20.00.06.27.632.264.31
SimCC-ResNet50COCO1772.178.238.751.681.885.20.00.00.00.00.20.236.755.50
SpinePose-lSpineTrack3775.279.577.081.185.487.785.589.291.092.288.490.028.664.22
SimCC-ResNet50*COCO1773.479.039.852.483.286.20.00.00.00.00.30.343.2912.42
RTMPose-x*Body82678.883.480.084.488.690.688.491.40.00.021.022.950.0017.29
RTMW-l*Cocktail1413375.680.465.470.186.088.385.689.20.00.08.18.157.207.91
RTMW-l*Cocktail1413377.282.366.671.887.389.988.391.30.00.08.68.657.3517.69
SpinePose-x*SpineTrack3775.980.177.681.886.388.586.389.789.391.088.989.950.6917.37
## SpineTrack Dataset The **SpineTrack** dataset comprises both real and synthetic data: - **SpineTrack-Real**: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints. - **SpineTrack-Unreal**: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations. To download: ```bash git lfs install git clone https://huggingface.co/datasets/saifkhichi96/spinetrack ``` Alternatively, use `wget` to download the dataset directly: ```bash wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip ``` In both cases, the dataset will download two zipped folders: `annotations` (24.8 MB) and `images` (19.4 GB), which can be unzipped to obtain the following structure: ```plaintext spinetrack ├── annotations/ │ ├── person_keypoints_train-real-coco.json │ ├── person_keypoints_train-real-yoga.json │ ├── person_keypoints_train-unreal.json │ └── person_keypoints_val2017.json └── images/ ├── train-real-coco/ ├── train-real-yoga/ ├── train-unreal/ └── val2017/ ``` All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks. The synthetic subset was primarily employed within the **active learning pipeline** used to bootstrap and refine annotations for real-world images. All released **SpinePose** models were trained exclusively on the **real** portion of the dataset. > [!WARNING] > A small number of annotations in the synthetic subset are corrupted. > We recommend avoiding their use until the updated labels are released in the next dataset version. ## Citation If you use SpinePose or SpineTrack in your research, please cite: **BibTeX:** ```bibtex @InProceedings{Khan_2025_CVPR, author = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier}, title = {Towards Unconstrained 2D Pose Estimation of the Human Spine}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {6171-6180} } ``` **APA:** _Khan, M. S. U., Krauß, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181)._ ## Model Card Contact [Muhammad Saif Ullah Khan](muhammad_saif_ullah.khan@dfki.de)