Reasoning Models
updated
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
• 2501.18585
• Published • 61
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
• 2502.07374
• Published • 40
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published • 153
S*: Test Time Scaling for Code Generation
Paper
• 2502.14382
• Published • 63
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published • 113
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
• 2503.05379
• Published • 38
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
• 2503.05132
• Published • 57
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
• 2503.05179
• Published • 46
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published • 61
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
• 2503.07536
• Published • 88
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
• 2503.12605
• Published • 35
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
• 2503.12937
• Published • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published • 145
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
• 2503.12797
• Published • 32
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Paper
• 2503.20641
• Published • 10
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published • 62
Effectively Controlling Reasoning Models through Thinking Intervention
Paper
• 2503.24370
• Published • 19
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published • 43
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
• 2503.24376
• Published • 38
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies
Ahead
Paper
• 2504.00294
• Published • 10
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
• 2506.01939
• Published • 190
Learning What Reinforcement Learning Can't: Interleaved Online
Fine-Tuning for Hardest Questions
Paper
• 2506.07527
• Published • 3
The Illusion of Thinking: Understanding the Strengths and Limitations of
Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.06941
• Published • 16
Reinforcement Pre-Training
Paper
• 2506.08007
• Published • 265
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Paper
• 2506.07976
• Published • 6
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published • 29
Kwai Keye-VL Technical Report
Paper
• 2507.01949
• Published • 132
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
• 2506.18254
• Published • 32
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper
• 2507.06448
• Published • 48
Test-Time Scaling with Reflective Generative Model
Paper
• 2507.01951
• Published • 108
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
• 2507.06261
• Published • 67
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
• 2507.05255
• Published • 75
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published • 88
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper
• 2507.14843
• Published • 85
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published • 320
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
• 2507.15758
• Published • 35
THU-KEG/LongWriter-Zero-32B
Text Generation
• 33B • Updated • 22
• • 112
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
• 2507.14958
• Published • 47
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published • 160
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published • 50
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
• 2508.10433
• Published • 146
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Paper
• 2510.08872
• Published • 4
RL makes MLLMs see better than SFT
Paper
• 2510.16333
• Published • 49
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published • 229
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published • 48
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
• 2511.16334
• Published • 94
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
• 2512.23988
• Published • 19
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Paper
• 2601.05432
• Published • 169
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper
• 2602.02493
• Published • 46
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper
• 2602.02488
• Published • 35
Code2World: A GUI World Model via Renderable Code Generation
Paper
• 2602.09856
• Published • 201
Helios: Real Real-Time Long Video Generation Model
Paper
• 2603.04379
• Published • 174
Phi-4-reasoning-vision-15B Technical Report
Paper
• 2603.03975
• Published • 19