Papers + RL/Reasoning
updated
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published • 145
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published • 26
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
• 2504.08600
• Published • 33
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
• 2504.11343
• Published • 20
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
• 2504.14870
• Published • 35
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large
Language Models
Paper
• 2504.15716
• Published • 12
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
• 2504.21776
• Published • 59
DeepCritic: Deliberate Critique with Large Language Models
Paper
• 2505.00662
• Published • 54
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
• 2505.07608
• Published • 82
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
• 2505.09343
• Published • 76
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
• 2505.12504
• Published • 24
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via
Reinforcement Learning
Paper
• 2505.11896
• Published • 58
Paper
• 2505.14674
• Published • 37
One-RL-to-See-Them-All/Orsta-Data-47k
Updated • 352
• 17
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
• 2505.18129
• Published • 62
RL with KL penalties is better viewed as Bayesian inference
Paper
• 2205.11275
• Published • 1
Asymptotics of Language Model Alignment
Paper
• 2404.01730
• Published • 1
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied
Iterative Policy Optimization
Paper
• 2505.19000
• Published • 42
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous
Concept Space
Paper
• 2505.15778
• Published • 19
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Paper
• 2505.23762
• Published • 45
Table-R1: Inference-Time Scaling for Table Reasoning
Paper
• 2505.23621
• Published • 93
Reinforcement Pre-Training
Paper
• 2506.08007
• Published • 265
Comment on The Illusion of Thinking: Understanding the Strengths and
Limitations of Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.09250
• Published • 27
Paper
• 2506.10910
• Published • 67
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published • 79
AutoTriton: Automatic Triton Programming with Reinforcement Learning in
LLMs
Paper
• 2507.05687
• Published • 31
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
• 2507.10532
• Published • 90
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published • 88
osmosis-ai/Osmosis-Apply-1.7B
Text Generation
• 2B • Updated • 17
• 95
Geometric-Mean Policy Optimization
Paper
• 2507.20673
• Published • 32
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published • 207
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published • 50
Training-Free Group Relative Policy Optimization
Paper
• 2510.08191
• Published • 45
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published • 33
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published • 106
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published • 41
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps
in Chain-of-Thought
Paper
• 2510.24941
• Published • 4