UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 37
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models Paper • 2510.08559 • Published Oct 9, 2025 • 8
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models Paper • 2510.08559 • Published Oct 9, 2025 • 8
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 Jul 23, 2025 • 47
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 203
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis Paper • 2503.23145 • Published Mar 29, 2025 • 35
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published Mar 17, 2025 • 22
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23, 2025 • 23
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23, 2025 • 23
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13, 2025 • 55
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13, 2025 • 55
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published Jan 6, 2025 • 7
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration Paper • 2412.13180 • Published Dec 17, 2024 • 13