Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 11 days ago • 59
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published Oct 22, 2025 • 30
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper • 2509.07969 • Published Sep 9, 2025 • 58
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17, 2025 • 41
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning Paper • 2507.14137 • Published Jul 18, 2025 • 34
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22, 2025 • 39
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21, 2025 • 68
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9, 2025 • 45
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation Paper • 2506.17202 • Published Jun 20, 2025 • 10
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3, 2025 • 58
TIIF-Bench: How Does Your T2I Model Follow Your Instructions? Paper • 2506.02161 • Published Jun 2, 2025 • 13
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10, 2025 • 30
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Paper • 2504.15280 • Published Apr 21, 2025 • 25
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published Mar 23, 2025 • 20
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10, 2025 • 88
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7, 2025 • 122
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published Mar 9, 2025 • 31
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published Mar 9, 2025 • 11