Motion-o: Trajectory-Grounded Video Reasoning
Paper • 2603.18856 • Published
Motion-O is a family of Qwen2.5-VL models fine-tuned for motion-aware trajectory reasoning in videos. This work is introduced in the paper Motion-o: Trajectory-Grounded Video Reasoning.
The models learn to produce structured <think>...</think> chains with <obj>, <box>, <t>, and <motion> tags that describe object motion over time, and to answer a final question about the video.
Links:
All variants live in this repository as subfolders:
(root) – grpo_dense_t07_4737145/checkpoint-800/merged open-o3-mcot – open-o3_grpo_v3_074917638/checkpoint-600/merged open-o3-mcot-no-vg – open-o3_grpo_v2_4896760/checkpoint-1000/merged from transformers import AutoModelForCausalLM, AutoProcessor
# 1) Motion-O (no visual grounding) – repo root
model = AutoModelForCausalLM.from_pretrained(
"bishoygaloaa/motion-o",
torch_dtype="auto",
)
processor = AutoProcessor.from_pretrained("bishoygaloaa/motion-o")
# 2) Open-o3 + MCoT (with visual grounding)
model_vg = AutoModelForCausalLM.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot",
torch_dtype="auto",
)
processor_vg = AutoProcessor.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot",
)
# 3) Open-o3 + MCoT (no visual grounding)
model_no_vg = AutoModelForCausalLM.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot-no-vg",
torch_dtype="auto",
)
processor_no_vg = AutoProcessor.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot-no-vg",
)
If you use Motion-O in your work, please cite:
@article{galoaa2026motion,
title = {Motion-Aware Trajectory Reasoning for Video Understanding},
author = {Galoaa, Bishoy* and Moezzi, Shayda* and Bai, Xiangyu and Ostadabbas, Sarah},
journal = {arXiv preprint arXiv:2603.18856},
year = {2026},
url = {https://arxiv.org/abs/2603.18856}
}
Base model
Qwen/Qwen2.5-VL-7B-Instruct