File size: 12,298 Bytes
f87a6a4 3c90485 bb4fad1 3c90485 e85dcbd 3c90485 e85dcbd 3c90485 e85dcbd 3c90485 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | ---
license: cc-by-nc-4.0
datasets:
- saifkhichi96/spinetrack
base_model:
- Tau-J/RTMPose
base_model_relation: merge
tags:
- 2d-human-pose-estimation
- computer-vision
- keypoint-detection
- spinepose
- spinetrack
language:
- en
---
# Model Card for **SpinePose** Family
**SpinePose** is a family of 2D human pose estimation models trained to estimate a **37-keypoint skeleton**, extending standard human body models to include the **spine**, **pelvis**, and **feet** regions in detail.
Four SpinePose variants (small, medium, large, and x-large) are available, with 0.72, 1.98, 4.22, and 17.37 GFLOPS respectively at inference time.
---
## Model Details
### Description
- **Developed by:** [Muhammad Saif Ullah Khan](https://saifkhichi.com/)
- **Affiliation:** Technical University of Kaiserslautern & [DFKI](https://av.dfki.de/)
- **Funding:** DFKI GmbH
- **Model Type:** Top-down 2D keypoint estimator
- **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/)
- **Frameworks:** PyTorch, ONNX Runtime
- **Input Resolution:** 256Γ192 or 384Γ288 (depending on variant)
### Sources
- **Repository:** [github.com/dfki-av/spinepose](https://github.com/dfki-av/spinepose)
- **Paper:** [CVPR Workshops 2025 (CVSPORTS)](https://openaccess.thecvf.com/content/CVPR2025W/CVSPORTS/html/Khan_Towards_Unconstrained_2D_Pose_Estimation_of_the_Human_Spine_CVPRW_2025_paper.html)
- **Demo:** [saifkhichi.com/research/spinepose](https://www.saifkhichi.com/research/spinepose/)
---
## Intended Uses
### Direct Use
- Human body and spine joint localization from RGB images or videos
- Real-time motion analysis for research, animation, or sports applications
- Augmentation of general-purpose pose estimators for anatomically rich tasks
### Downstream Use
- Integration with clinical posture tracking systems
- 3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset)
- Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga)
### Out-of-Scope Use
- Any medical diagnosis or treatment application without human oversight
- Full-body 3D reconstruction (requires separate lifting model)
- Unverified use in safety-critical systems
---
## Bias, Risks, and Limitations
- Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses.
- Limited diversity in body types and cultural attire representation.
- Bias inherited from COCO/Body8 datasets used for pretraining the teachers.
### Recommendations
Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias.
---
## Getting Started
### Installation
```bash
pip install spinepose
```
On Linux/Windows with CUDA available, install the GPU version:
```bash
pip install spinepose[gpu]
```
### CLI Usage
```bash
spinepose -i /path/to/image_or_video -o /path/to/output
```
This automatically downloads the correct ONNX checkpoint.
Run `spinepose -h` for detailed usage options.
### Python API
```python
import cv2
from spinepose import SpinePoseEstimator
# Initialize estimator (downloads ONNX model if not found locally)
estimator = SpinePoseEstimator(device='cuda')
# Perform inference on a single image
image = cv2.imread('path/to/image.jpg')
keypoints, scores = estimator.predict(image)
visualized = estimator.visualize(image, keypoints, scores)
cv2.imwrite('output.jpg', visualized)
```
For higher-level use:
```python
from spinepose.inference import infer_image, infer_video
# Single image inference
infer_image('path/to/image.jpg', 'output.jpg')
# Video inference with optional temporal smoothing
infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True)
```
## Evaluation
To reproduce results, prepare the following directory layout:
```plaintext
<PROJECT_DIR>/
ββ data/
β ββ spinetrack/
β ββ coco/
β ββ halpe/
ββ checkpoints/
ββ spinepose-s_32xb256-10e_spinetrack-256x192.pth
ββ spinepose-m_32xb256-10e_spinetrack-256x192.pth
ββ spinepose-l_32xb256-10e_spinetrack-256x192.pth
ββ spinepose-x_32xb128-10e_spinetrack-384x288.pth
```
Each PyTorch checkpoint contains both `teacher` and `student` weights, with only the `student` used during inference. Exported ONNX checkpoints only contain the `student`.
### Metrics
We report **Average Precision (AP)** and **Average Recall (AR)** under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format.
### Results
<table border="1" cellspacing="0" cellpadding="6" style="border-collapse:collapse; text-align:center; font-family:Arial; font-size:13px;">
<thead style="background-color:#f0f0f0; font-weight:bold;">
<tr>
<th>Method</th>
<th>Train Data</th>
<th>Kpts</th>
<th colspan="2">COCO</th>
<th colspan="2">Halpe26</th>
<th colspan="2">Body</th>
<th colspan="2">Feet</th>
<th colspan="2">Spine</th>
<th colspan="2">Overall</th>
<th>Params (M)</th>
<th>FLOPs (G)</th>
</tr>
<tr>
<th></th><th></th><th></th>
<th>AP</th><th>AR</th>
<th>AP</th><th>AR</th>
<th>AP</th><th>AR</th>
<th>AP</th><th>AR</th>
<th>AP</th><th>AR</th>
<th>AP</th><th>AR</th>
<th></th><th></th>
</tr>
</thead>
<tbody>
<tr><td>SimCC-MBV2</td><td>COCO</td><td>17</td><td>62.0</td><td>67.8</td><td>33.2</td><td>43.9</td><td>72.1</td><td>75.6</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.1</td><td>0.1</td><td>2.29</td><td>0.31</td></tr>
<tr><td>RTMPose-t</td><td>Body8</td><td>26</td><td>65.9</td><td>71.3</td><td>68.0</td><td>73.2</td><td>76.9</td><td>80.0</td><td>74.1</td><td>79.7</td><td>0.0</td><td>0.0</td><td>15.8</td><td>17.9</td><td>3.51</td><td>0.37</td></tr>
<tr><td>RTMPose-s</td><td>Body8</td><td>26</td><td>69.7</td><td>74.7</td><td>72.0</td><td>76.7</td><td>80.9</td><td>83.6</td><td>78.9</td><td>83.5</td><td>0.0</td><td>0.0</td><td>17.2</td><td>19.4</td><td>5.70</td><td>0.70</td></tr>
<tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-s</td><td>SpineTrack</td><td>37</td><td>68.2</td><td>73.1</td><td>70.6</td><td>75.2</td><td>79.1</td><td>82.1</td><td>77.5</td><td>82.9</td><td>89.6</td><td>90.7</td><td>84.2</td><td>86.2</td><td>5.98</td><td>0.72</td></tr>
<tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
<tr><td>SimCC-ViPNAS</td><td>COCO</td><td>17</td><td>69.5</td><td>75.5</td><td>36.9</td><td>49.7</td><td>79.6</td><td>83.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>8.65</td><td>0.80</td></tr>
<tr><td>RTMPose-m</td><td>Body8</td><td>26</td><td>75.1</td><td>80.0</td><td>76.7</td><td>81.3</td><td>85.5</td><td>87.9</td><td>84.1</td><td>88.2</td><td>0.0</td><td>0.0</td><td>19.4</td><td>21.4</td><td>13.93</td><td>1.95</td></tr>
<tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-m</td><td>SpineTrack</td><td>37</td><td>73.0</td><td>77.5</td><td>75.0</td><td>79.2</td><td>84.0</td><td>86.4</td><td>83.5</td><td>87.4</td><td>91.4</td><td>92.5</td><td>88.0</td><td>89.5</td><td>14.34</td><td>1.98</td></tr>
<tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
<tr><td>RTMPose-l</td><td>Body8</td><td>26</td><td>76.9</td><td>81.5</td><td>78.4</td><td>82.9</td><td>86.8</td><td>89.2</td><td>86.9</td><td>90.0</td><td>0.0</td><td>0.0</td><td>20.0</td><td>22.0</td><td>28.11</td><td>4.19</td></tr>
<tr><td>RTMW-m</td><td>Cocktail14</td><td>133</td><td>73.8</td><td>78.7</td><td>63.8</td><td>68.5</td><td>84.3</td><td>86.7</td><td>83.0</td><td>87.2</td><td>0.0</td><td>0.0</td><td>6.2</td><td>7.6</td><td>32.26</td><td>4.31</td></tr>
<tr><td>SimCC-ResNet50</td><td>COCO</td><td>17</td><td>72.1</td><td>78.2</td><td>38.7</td><td>51.6</td><td>81.8</td><td>85.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>36.75</td><td>5.50</td></tr>
<tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-l</td><td>SpineTrack</td><td>37</td><td>75.2</td><td>79.5</td><td>77.0</td><td>81.1</td><td>85.4</td><td>87.7</td><td>85.5</td><td>89.2</td><td>91.0</td><td>92.2</td><td>88.4</td><td>90.0</td><td>28.66</td><td>4.22</td></tr>
<tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
<tr><td>SimCC-ResNet50*</td><td>COCO</td><td>17</td><td>73.4</td><td>79.0</td><td>39.8</td><td>52.4</td><td>83.2</td><td>86.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.3</td><td>0.3</td><td>43.29</td><td>12.42</td></tr>
<tr><td>RTMPose-x*</td><td>Body8</td><td>26</td><td>78.8</td><td>83.4</td><td>80.0</td><td>84.4</td><td>88.6</td><td>90.6</td><td>88.4</td><td>91.4</td><td>0.0</td><td>0.0</td><td>21.0</td><td>22.9</td><td>50.00</td><td>17.29</td></tr>
<tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>75.6</td><td>80.4</td><td>65.4</td><td>70.1</td><td>86.0</td><td>88.3</td><td>85.6</td><td>89.2</td><td>0.0</td><td>0.0</td><td>8.1</td><td>8.1</td><td>57.20</td><td>7.91</td></tr>
<tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>77.2</td><td>82.3</td><td>66.6</td><td>71.8</td><td>87.3</td><td>89.9</td><td>88.3</td><td>91.3</td><td>0.0</td><td>0.0</td><td>8.6</td><td>8.6</td><td>57.35</td><td>17.69</td></tr>
<tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-x*</td><td>SpineTrack</td><td>37</td><td>75.9</td><td>80.1</td><td>77.6</td><td>81.8</td><td>86.3</td><td>88.5</td><td>86.3</td><td>89.7</td><td>89.3</td><td>91.0</td><td>88.9</td><td>89.9</td><td>50.69</td><td>17.37</td></tr>
</tbody>
</table>
## SpineTrack Dataset
The **SpineTrack** dataset comprises both real and synthetic data:
- **SpineTrack-Real**: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints.
- **SpineTrack-Unreal**: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations.
To download:
```bash
git lfs install
git clone https://huggingface.co/datasets/saifkhichi96/spinetrack
```
Alternatively, use `wget` to download the dataset directly:
```bash
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip
```
In both cases, the dataset will download two zipped folders: `annotations` (24.8 MB) and `images` (19.4 GB), which can be unzipped to obtain the following structure:
```plaintext
spinetrack
βββ annotations/
β βββ person_keypoints_train-real-coco.json
β βββ person_keypoints_train-real-yoga.json
β βββ person_keypoints_train-unreal.json
β βββ person_keypoints_val2017.json
βββ images/
βββ train-real-coco/
βββ train-real-yoga/
βββ train-unreal/
βββ val2017/
```
All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks.
The synthetic subset was primarily employed within the **active learning pipeline** used to bootstrap and refine annotations for real-world images.
All released **SpinePose** models were trained exclusively on the **real** portion of the dataset.
> [!WARNING]
> A small number of annotations in the synthetic subset are corrupted.
> We recommend avoiding their use until the updated labels are released in the next dataset version.
## Citation
If you use SpinePose or SpineTrack in your research, please cite:
**BibTeX:**
```bibtex
@InProceedings{Khan_2025_CVPR,
author = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
title = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2025},
pages = {6171-6180}
}
```
**APA:**
_Khan, M. S. U., KrauΓ, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181)._
## Model Card Contact
[Muhammad Saif Ullah Khan](muhammad_saif_ullah.khan@dfki.de) |