Keypoint Detection
ONNX
English
2d-human-pose-estimation
computer-vision
spinepose
spinetrack
File size: 12,298 Bytes
f87a6a4
 
 
 
3c90485
 
bb4fad1
3c90485
 
 
 
 
 
 
 
 
 
e85dcbd
3c90485
 
e85dcbd
 
3c90485
 
 
e85dcbd
3c90485
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
---
license: cc-by-nc-4.0
datasets:
- saifkhichi96/spinetrack
base_model:
- Tau-J/RTMPose
base_model_relation: merge
tags:
- 2d-human-pose-estimation
- computer-vision
- keypoint-detection
- spinepose
- spinetrack
language:
- en
---

# Model Card for **SpinePose** Family

**SpinePose** is a family of 2D human pose estimation models trained to estimate a **37-keypoint skeleton**, extending standard human body models to include the **spine**, **pelvis**, and **feet** regions in detail.  

Four SpinePose variants (small, medium, large, and x-large) are available, with 0.72, 1.98, 4.22, and 17.37 GFLOPS respectively at inference time. 

---

## Model Details

### Description

- **Developed by:** [Muhammad Saif Ullah Khan](https://saifkhichi.com/)
- **Affiliation:** Technical University of Kaiserslautern & [DFKI](https://av.dfki.de/)
- **Funding:** DFKI GmbH
- **Model Type:** Top-down 2D keypoint estimator
- **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/)
- **Frameworks:** PyTorch, ONNX Runtime
- **Input Resolution:** 256Γ—192 or 384Γ—288 (depending on variant)

### Sources

- **Repository:** [github.com/dfki-av/spinepose](https://github.com/dfki-av/spinepose)  
- **Paper:** [CVPR Workshops 2025 (CVSPORTS)](https://openaccess.thecvf.com/content/CVPR2025W/CVSPORTS/html/Khan_Towards_Unconstrained_2D_Pose_Estimation_of_the_Human_Spine_CVPRW_2025_paper.html)  
- **Demo:** [saifkhichi.com/research/spinepose](https://www.saifkhichi.com/research/spinepose/)  

---

## Intended Uses

### Direct Use
- Human body and spine joint localization from RGB images or videos  
- Real-time motion analysis for research, animation, or sports applications  
- Augmentation of general-purpose pose estimators for anatomically rich tasks  

### Downstream Use
- Integration with clinical posture tracking systems  
- 3D pose lifting or musculoskeletal modeling (via SpineTrack synthetic subset)  
- Fine-tuning on domain-specific datasets (industrial, rehabilitation, yoga)  

### Out-of-Scope Use
- Any medical diagnosis or treatment application without human oversight  
- Full-body 3D reconstruction (requires separate lifting model)  
- Unverified use in safety-critical systems  

---

## Bias, Risks, and Limitations

- Model trained primarily on controlled and synthetic datasets; may underperform in occluded or extreme poses.  
- Limited diversity in body types and cultural attire representation.  
- Bias inherited from COCO/Body8 datasets used for pretraining the teachers.  

### Recommendations
Evaluate the model on your specific domain and retrain or augment using domain-specific samples to mitigate dataset bias.

---

## Getting Started

### Installation

```bash
pip install spinepose
```

On Linux/Windows with CUDA available, install the GPU version:

```bash
pip install spinepose[gpu]
```

### CLI Usage

```bash
spinepose -i /path/to/image_or_video -o /path/to/output
```

This automatically downloads the correct ONNX checkpoint.
Run `spinepose -h` for detailed usage options.

### Python API

```python
import cv2
from spinepose import SpinePoseEstimator

# Initialize estimator (downloads ONNX model if not found locally)
estimator = SpinePoseEstimator(device='cuda')

# Perform inference on a single image
image = cv2.imread('path/to/image.jpg')
keypoints, scores = estimator.predict(image)
visualized = estimator.visualize(image, keypoints, scores)
cv2.imwrite('output.jpg', visualized)
```

For higher-level use:

```python
from spinepose.inference import infer_image, infer_video

# Single image inference
infer_image('path/to/image.jpg', 'output.jpg')

# Video inference with optional temporal smoothing
infer_video('path/to/video.mp4', 'output_video.mp4', use_smoothing=True)
```

## Evaluation

To reproduce results, prepare the following directory layout:

```plaintext
<PROJECT_DIR>/
β”œβ”€ data/
β”‚  β”œβ”€ spinetrack/
β”‚  β”œβ”€ coco/
β”‚  └─ halpe/
└─ checkpoints/
   β”œβ”€ spinepose-s_32xb256-10e_spinetrack-256x192.pth
   β”œβ”€ spinepose-m_32xb256-10e_spinetrack-256x192.pth
   β”œβ”€ spinepose-l_32xb256-10e_spinetrack-256x192.pth
   └─ spinepose-x_32xb128-10e_spinetrack-384x288.pth
```

Each PyTorch checkpoint contains both `teacher` and `student` weights, with only the `student` used during inference. Exported ONNX checkpoints only contain the `student`.

### Metrics

We report **Average Precision (AP)** and **Average Recall (AR)** under varying Object Keypoint Similarity (OKS) thresholds, consistent with COCO conventions but extended to the 37-keypoint SpineTrack format.

### Results

<table border="1" cellspacing="0" cellpadding="6" style="border-collapse:collapse; text-align:center; font-family:Arial; font-size:13px;">
  <thead style="background-color:#f0f0f0; font-weight:bold;">
    <tr>
      <th>Method</th>
      <th>Train Data</th>
      <th>Kpts</th>
      <th colspan="2">COCO</th>
      <th colspan="2">Halpe26</th>
      <th colspan="2">Body</th>
      <th colspan="2">Feet</th>
      <th colspan="2">Spine</th>
      <th colspan="2">Overall</th>
      <th>Params (M)</th>
      <th>FLOPs (G)</th>
    </tr>
    <tr>
      <th></th><th></th><th></th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th>AP</th><th>AR</th>
      <th></th><th></th>
    </tr>
  </thead>
  <tbody>
    <tr><td>SimCC-MBV2</td><td>COCO</td><td>17</td><td>62.0</td><td>67.8</td><td>33.2</td><td>43.9</td><td>72.1</td><td>75.6</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.1</td><td>0.1</td><td>2.29</td><td>0.31</td></tr>
    <tr><td>RTMPose-t</td><td>Body8</td><td>26</td><td>65.9</td><td>71.3</td><td>68.0</td><td>73.2</td><td>76.9</td><td>80.0</td><td>74.1</td><td>79.7</td><td>0.0</td><td>0.0</td><td>15.8</td><td>17.9</td><td>3.51</td><td>0.37</td></tr>
    <tr><td>RTMPose-s</td><td>Body8</td><td>26</td><td>69.7</td><td>74.7</td><td>72.0</td><td>76.7</td><td>80.9</td><td>83.6</td><td>78.9</td><td>83.5</td><td>0.0</td><td>0.0</td><td>17.2</td><td>19.4</td><td>5.70</td><td>0.70</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-s</td><td>SpineTrack</td><td>37</td><td>68.2</td><td>73.1</td><td>70.6</td><td>75.2</td><td>79.1</td><td>82.1</td><td>77.5</td><td>82.9</td><td>89.6</td><td>90.7</td><td>84.2</td><td>86.2</td><td>5.98</td><td>0.72</td></tr>
    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
    <tr><td>SimCC-ViPNAS</td><td>COCO</td><td>17</td><td>69.5</td><td>75.5</td><td>36.9</td><td>49.7</td><td>79.6</td><td>83.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>8.65</td><td>0.80</td></tr>
    <tr><td>RTMPose-m</td><td>Body8</td><td>26</td><td>75.1</td><td>80.0</td><td>76.7</td><td>81.3</td><td>85.5</td><td>87.9</td><td>84.1</td><td>88.2</td><td>0.0</td><td>0.0</td><td>19.4</td><td>21.4</td><td>13.93</td><td>1.95</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-m</td><td>SpineTrack</td><td>37</td><td>73.0</td><td>77.5</td><td>75.0</td><td>79.2</td><td>84.0</td><td>86.4</td><td>83.5</td><td>87.4</td><td>91.4</td><td>92.5</td><td>88.0</td><td>89.5</td><td>14.34</td><td>1.98</td></tr>
    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
    <tr><td>RTMPose-l</td><td>Body8</td><td>26</td><td>76.9</td><td>81.5</td><td>78.4</td><td>82.9</td><td>86.8</td><td>89.2</td><td>86.9</td><td>90.0</td><td>0.0</td><td>0.0</td><td>20.0</td><td>22.0</td><td>28.11</td><td>4.19</td></tr>
    <tr><td>RTMW-m</td><td>Cocktail14</td><td>133</td><td>73.8</td><td>78.7</td><td>63.8</td><td>68.5</td><td>84.3</td><td>86.7</td><td>83.0</td><td>87.2</td><td>0.0</td><td>0.0</td><td>6.2</td><td>7.6</td><td>32.26</td><td>4.31</td></tr>
    <tr><td>SimCC-ResNet50</td><td>COCO</td><td>17</td><td>72.1</td><td>78.2</td><td>38.7</td><td>51.6</td><td>81.8</td><td>85.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.2</td><td>0.2</td><td>36.75</td><td>5.50</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-l</td><td>SpineTrack</td><td>37</td><td>75.2</td><td>79.5</td><td>77.0</td><td>81.1</td><td>85.4</td><td>87.7</td><td>85.5</td><td>89.2</td><td>91.0</td><td>92.2</td><td>88.4</td><td>90.0</td><td>28.66</td><td>4.22</td></tr>
    <tr><td colspan="17" style="background-color:#d0d0d0; height:3px;"></td></tr>
    <tr><td>SimCC-ResNet50*</td><td>COCO</td><td>17</td><td>73.4</td><td>79.0</td><td>39.8</td><td>52.4</td><td>83.2</td><td>86.2</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.3</td><td>0.3</td><td>43.29</td><td>12.42</td></tr>
    <tr><td>RTMPose-x*</td><td>Body8</td><td>26</td><td>78.8</td><td>83.4</td><td>80.0</td><td>84.4</td><td>88.6</td><td>90.6</td><td>88.4</td><td>91.4</td><td>0.0</td><td>0.0</td><td>21.0</td><td>22.9</td><td>50.00</td><td>17.29</td></tr>
    <tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>75.6</td><td>80.4</td><td>65.4</td><td>70.1</td><td>86.0</td><td>88.3</td><td>85.6</td><td>89.2</td><td>0.0</td><td>0.0</td><td>8.1</td><td>8.1</td><td>57.20</td><td>7.91</td></tr>
    <tr><td>RTMW-l*</td><td>Cocktail14</td><td>133</td><td>77.2</td><td>82.3</td><td>66.6</td><td>71.8</td><td>87.3</td><td>89.9</td><td>88.3</td><td>91.3</td><td>0.0</td><td>0.0</td><td>8.6</td><td>8.6</td><td>57.35</td><td>17.69</td></tr>
    <tr style="background-color:#e6e6e6; font-weight:bold;"><td>SpinePose-x*</td><td>SpineTrack</td><td>37</td><td>75.9</td><td>80.1</td><td>77.6</td><td>81.8</td><td>86.3</td><td>88.5</td><td>86.3</td><td>89.7</td><td>89.3</td><td>91.0</td><td>88.9</td><td>89.9</td><td>50.69</td><td>17.37</td></tr>
  </tbody>
</table>

## SpineTrack Dataset

The **SpineTrack** dataset comprises both real and synthetic data:

- **SpineTrack-Real**: Annotated natural images with nine detailed spinal landmarks in addition to COCO joints.
- **SpineTrack-Unreal**: Synthetic subset rendered in Unreal Engine with biomechanically aligned OpenSim annotations.

To download:

```bash
git lfs install
git clone https://huggingface.co/datasets/saifkhichi96/spinetrack
```

Alternatively, use `wget` to download the dataset directly:

```bash
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip
```

In both cases, the dataset will download two zipped folders: `annotations` (24.8 MB) and `images` (19.4 GB), which can be unzipped to obtain the following structure:

```plaintext
spinetrack
β”œβ”€β”€ annotations/
β”‚   β”œβ”€β”€ person_keypoints_train-real-coco.json
β”‚   β”œβ”€β”€ person_keypoints_train-real-yoga.json
β”‚   β”œβ”€β”€ person_keypoints_train-unreal.json
β”‚   └── person_keypoints_val2017.json
└── images/
    β”œβ”€β”€ train-real-coco/
    β”œβ”€β”€ train-real-yoga/
    β”œβ”€β”€ train-unreal/
    └── val2017/
```

All annotations follow the COCO format, directly compatible with MMPose, Detectron2, or similar frameworks.

The synthetic subset was primarily employed within the **active learning pipeline** used to bootstrap and refine annotations for real-world images.  
All released **SpinePose** models were trained exclusively on the **real** portion of the dataset.

> [!WARNING]
> A small number of annotations in the synthetic subset are corrupted.  
> We recommend avoiding their use until the updated labels are released in the next dataset version.

## Citation

If you use SpinePose or SpineTrack in your research, please cite:

**BibTeX:**

```bibtex
@InProceedings{Khan_2025_CVPR,
    author    = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
    title     = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6171-6180}
}
```

**APA:**

_Khan, M. S. U., Krauß, S., & Stricker, D. (2025). Towards Unconstrained 2D Pose Estimation of the Human Spine. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 6172-6181)._

## Model Card Contact

[Muhammad Saif Ullah Khan](muhammad_saif_ullah.khan@dfki.de)