---
license: cc-by-nc-4.0
datasets:
- saifkhichi96/spinetrack
- dfki-av/simspine
base_model:
- Tau-J/RTMPose
base_model_relation: merge
tags:
- 2d-human-pose-estimation
- computer-vision
- keypoint-detection
- spinepose
- spinetrack
language:
- en
---

# Model Card for **SpinePose** Family

**SpinePose** is a family of 2D human pose estimation models trained to estimate a **37-keypoint skeleton**, extending standard human body models to include the **spine**, **pelvis**, and **feet** regions in detail.  

Four SpinePose variants (small, medium, large, and x-large) are available, with 0.72, 1.98, 4.22, and 17.37 GFLOPS respectively at inference time. V1 models were trained on the [SpineTrack dataset](https://huggingface.co/datasets/saifkhichi96/spinetrack) and V2 models were fine-tuned on the [SIMSPINE dataset](https://huggingface.co/datasets/dfki-av/simspine), published in our [CVSports @ CVPR 2025](https://huggingface.co/papers/2504.08110) and [CVPR 2026](https://huggingface.co/papers/2602.20792) papers, respectively.

---

## Model Details

### Description

- **Developed by:** [Muhammad Saif Ullah Khan](https://saifkhichi.com/)
- **Affiliation:** Technical University of Kaiserslautern & [DFKI](https://av.dfki.de/)
- **Funding:** DFKI GmbH
- **Model Type:** Top-down 2D keypoint estimator
- **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/)
- **Frameworks:** PyTorch, ONNX Runtime
- **Input Resolution:** 256×192 or 384×288 (depending on variant)

### Sources

- **Repository:** [github.com/dfki-av/spinepose](https://github.com/dfki-av/spinepose)  
- **Paper:** [CVPR Workshops 2025 (CVSPORTS)](https://openaccess.thecvf.com/content/CVPR2025W/CVSPORTS/html/Khan_Towards_Unconstrained_2D_Pose_Estimation_of_the_Human_Spine_CVPRW_2025_paper.html)  
- **Demo:** [saifkhichi.com/research/spinepose](https://www.saifkhichi.com/research/spinepose/)  

---

## Intended Uses

### Direct Use
- Human body and spine joint localization from RGB images or videos  
- Real-time motion analysis for research, animation, or sports applications  
- Augmentation of general-purpose pose estimators for anatomically rich tasks  

### Downstream Use
- Integration with clinical posture tracking systems  
- 3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset)  
- Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga)  

### Out-of-Scope Use
- Any medical diagnosis or treatment application without human oversight  
- Full-body 3D reconstruction (requires separate lifting model)  
- Unverified use in safety-critical systems  

---

## Bias, Risks, and Limitations

- Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses.  
- Limited diversity in body types and cultural attire representation.  
- Bias inherited from COCO/Body8 datasets used for pretraining the teachers.  

### Recommendations
Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias.

---

## Getting Started

### Installation

```bash
pip install spinepose
```

On Linux/Windows with CUDA available, install the GPU version:

```bash
pip install spinepose[gpu]
```

### CLI Usage

```bash
spinepose -i /path/to/image_or_video -o /path/to/output
```

This automatically downloads the correct ONNX checkpoint.
Run `spinepose -h` for detailed usage options.

### Python API

```python
import cv2
from spinepose import SpinePoseEstimator

# Initialize estimator (downloads ONNX model if not found locally)
estimator = SpinePoseEstimator(device='cuda')

# Perform inference on a single image
image = cv2.imread('path/to/image.jpg')
keypoints, scores = estimator.predict(image)
visualized = estimator.visualize(image, keypoints, scores)
cv2.imwrite('output.jpg', visualized)
```

For higher-level use:

```python
from spinepose.inference import infer_image, infer_video

# Single image inference
infer_image('path/to/image.jpg', 'output.jpg')

# Video inference with optional temporal smoothing
infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True)
```

## Evaluation

To reproduce results, prepare the following directory layout:

```plaintext
<PROJECT_DIR>/
├─ data/
│  ├─ spinetrack/
│  ├─ coco/
│  └─ halpe/
└─ checkpoints/
   ├─ spinepose-s_32xb256-10e_spinetrack-256x192.pth
   ├─ spinepose-m_32xb256-10e_spinetrack-256x192.pth
   ├─ spinepose-l_32xb256-10e_spinetrack-256x192.pth
   └─ spinepose-x_32xb128-10e_spinetrack-384x288.pth
```

Each PyTorch checkpoint contains both `teacher` and `student` weights, with only the `student` used during inference. Exported ONNX checkpoints only contain the `student`.

### Metrics

We report **Average Precision (AP)** and **Average Recall (AR)** under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format.

### Results

<table border="1" cellspacing="0" cellpadding="6" style="border-collapse:collapse; text-align:center; font-family:Arial; font-size:13px;">
  <thead style="background-color:#f0f0f0; font-weight:bold;">
    <tr>
      <th>Method</th>
      <th>Train Data</th>
      <th>Kpts</th>
      <th colspan="2">COCO</th>
      <th colspan="2">Halpe26</th>
      <th colspan="2">Body</th>
      <th colspan="2">Feet</th>
      <th colspan="2">Spine</th>
      <th colspan="2">Overall</th>
      <th>Params (M)</th>
      <th>FLOPs (G)</th>
    </tr>
    <tr>
      <th></th><th></th><th></th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th></th><th></th>
    </tr>
  </thead>
  <tbody>
    <tr><td>SimCC-MBV2</td><td>COCO</td><td>17</td><td>62.0</td><td>67.8</td><td>33.2</td><td>43.9</td><td>72.1</td><td>75.6</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.1</td><td>0.1</td><td>2.29</td><td>0.31</td></tr>
    <tr><td>RTMPose-t</td><td>Body8</td><td>26</td><td>65.9</td><td>71.3</td><td>68.0</td><td>73.2</td><td>76.9</td><td>80.0</td><td>74.1</td><td>79.7</td><td>0.0</td><td>0.0</td><td>15.8</td><td>17.9</td><td>3.51</td><td>0.37</td></tr>
    <tr><td>RTMPose-s</td><td>Body8</td><td>26</td><td>69.7</td><td>74.7</td><td>72.0</td><td>76.7</td><td>80.9</td><td>83.6</td><td>78.9</td><td>83.5</td><td>0.0</td><td>0.0</td><td>17.2</td><td>19.4</td><td>5.70</td><td>0.70</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-s</td><td>SpineTrack</td><td>37</td><td>68.2</td><td>73.1</td><td>70.6</td><td>75.2</td><td>79.1</td><td>82.1</td><td>77.5</td><td>82.9</td><td>89.6</td><td>90.7</td><td>84.2</td><td>86.2</td><td>5.98</td><td>0.72</td></tr>
    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
    <tr><td>SimCC-ViPNAS</td><td>COCO</td><td>17</td><td>69.5</td><td>75.5</td><td>36.9</td><td>49.7</td><td>79.6</td><td>83.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>8.65</td><td>0.80</td></tr>
    <tr><td>RTMPose-m</td><td>Body8</td><td>26</td><td>75.1</td><td>80.0</td><td>76.7</td><td>81.3</td><td>85.5</td><td>87.9</td><td>84.1</td><td>88.2</td><td>0.0</td><td>0.0</td><td>19.4</td><td>21.4</td><td>13.93</td><td>1.95</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-m</td><td>SpineTrack</td><td>37</td><td>73.0</td><td>77.5</td><td>75.0</td><td>79.2</td><td>84.0</td><td>86.4</td><td>83.5</td><td>87.4</td><td>91.4</td><td>92.5</td><td>88.0</td><td>89.5</td><td>14.34</td><td>1.98</td></tr>
    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
    <tr><td>RTMPose-l</td><td>Body8</td><td>26</td><td>76.9</td><td>81.5</td><td>78.4</td><td>82.9</td><td>86.8</td><td>89.2</td><td>86.9</td><td>90.0</td><td>0.0</td><td>0.0</td><td>20.0</td><td>22.0</td><td>28.11</td><td>4.19</td></tr>
    <tr><td>RTMW-m</td><td>Cocktail14</td><td>133</td><td>73.8</td><td>78.7</td><td>63.8</td><td>68.5</td><td>84.3</td><td>86.7</td><td>83.0</td><td>87.2</td><td>0.0</td><td>0.0</td><td>6.2</td><td>7.6</td><td>32.26</td><td>4.31</td></tr>
    <tr><td>SimCC-ResNet50</td><td>COCO</td><td>17</td><td>72.1</td><td>78.2</td><td>38.7</td><td>51.6</td><td>81.8</td><td>85.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>36.75</td><td>5.50</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-l</td><td>SpineTrack</td><td>37</td><td>75.2</td><td>79.5</td><td>77.0</td><td>81.1</td><td>85.4</td><td>87.7</td><td>85.5</td><td>89.2</td><td>91.0</td><td>92.2</td><td>88.4</td><td>90.0</td><td>28.66</td><td>4.22</td></tr>
    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
    <tr><td>SimCC-ResNet50*</td><td>COCO</td><td>17</td><td>73.4</td><td>79.0</td><td>39.8</td><td>52.4</td><td>83.2</td><td>86.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.3</td><td>0.3</td><td>43.29</td><td>12.42</td></tr>
    <tr><td>RTMPose-x*</td><td>Body8</td><td>26</td><td>78.8</td><td>83.4</td><td>80.0</td><td>84.4</td><td>88.6</td><td>90.6</td><td>88.4</td><td>91.4</td><td>0.0</td><td>0.0</td><td>21.0</td><td>22.9</td><td>50.00</td><td>17.29</td></tr>
    <tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>75.6</td><td>80.4</td><td>65.4</td><td>70.1</td><td>86.0</td><td>88.3</td><td>85.6</td><td>89.2</td><td>0.0</td><td>0.0</td><td>8.1</td><td>8.1</td><td>57.20</td><td>7.91</td></tr>
    <tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>77.2</td><td>82.3</td><td>66.6</td><td>71.8</td><td>87.3</td><td>89.9</td><td>88.3</td><td>91.3</td><td>0.0</td><td>0.0</td><td>8.6</td><td>8.6</td><td>57.35</td><td>17.69</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-x*</td><td>SpineTrack</td><td>37</td><td>75.9</td><td>80.1</td><td>77.6</td><td>81.8</td><td>86.3</td><td>88.5</td><td>86.3</td><td>89.7</td><td>89.3</td><td>91.0</td><td>88.9</td><td>89.9</td><td>50.69</td><td>17.37</td></tr>
  </tbody>
</table>

## SpineTrack Dataset

The **SpineTrack** dataset comprises both real and synthetic data:

- **SpineTrack-Real**: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints.
- **SpineTrack-Unreal**: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations.

To download:

```bash
git lfs install
git clone https://huggingface.co/datasets/saifkhichi96/spinetrack
```

Alternatively, use `wget` to download the dataset directly:

```bash
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip
```

In both cases, the dataset will download two zipped folders: `annotations` (24.8 MB) and `images` (19.4 GB), which can be unzipped to obtain the following structure:

```plaintext
spinetrack
├── annotations/
│   ├── person_keypoints_train-real-coco.json
│   ├── person_keypoints_train-real-yoga.json
│   ├── person_keypoints_train-unreal.json
│   └── person_keypoints_val2017.json
└── images/
    ├── train-real-coco/
    ├── train-real-yoga/
    ├── train-unreal/
    └── val2017/
```

All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks.

The synthetic subset was primarily employed within the **active learning pipeline** used to bootstrap and refine annotations for real-world images.  
All released **SpinePose** models were trained exclusively on the **real** portion of the dataset.

> [!WARNING]
> A small number of annotations in the synthetic subset are corrupted.  
> We recommend avoiding their use until the updated labels are released in the next dataset version.

## Citation

If you use SpinePose or SpineTrack in your research, please cite:

**BibTeX:**

```bibtex
@InProceedings{Khan_2025_CVPR,
    author    = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
    title     = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6171-6180}
}
```

**APA:**

_Khan, M. S. U., Krauß, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181)._

## Model Card Contact

[Muhammad Saif Ullah Khan](muhammad_saif_ullah.khan@dfki.de)