Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published Sep 25, 2025 • 11
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8, 2025 • 41
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency Paper • 2504.18589 • Published Apr 24, 2025 • 13 • 3
What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations Paper • 2502.08279 • Published Feb 12, 2025 • 1
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54 • 3
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Paper • 2502.17540 • Published Feb 24, 2025 • 3
Thus Spake Long-Context Large Language Model Paper • 2502.17129 • Published Feb 24, 2025 • 73 • 6