Papers + RL/Reasoning - a sugatoray Collection

sugatoray 's Collections

Papers + RL/Reasoning

RLMs (Reasoning Language Models)

Books And Notes

Reasoning Datasets

SmolAgents Tools (Spaces)

Bookmark::Models

LLM Training Datasets

Leaderboards 🔥

Papers-Fundamentals

TFM: TimeSeries Foundation Models

Papers-Benchmarks

LLMs-EmbeddingModels

LLM + Datasets : Finance

Papers + RL/Reasoning

updated 7 days ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 145
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11, 2025 • 33
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15, 2025 • 20
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

Paper • 2504.15716 • Published Apr 22, 2025 • 12
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30, 2025 • 59
DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1, 2025 • 54
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12, 2025 • 82
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14, 2025 • 76
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Paper • 2505.12504 • Published May 18, 2025 • 24
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Paper • 2505.11896 • Published May 17, 2025 • 58
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
One-RL-to-See-Them-All/Orsta-Data-47k

Updated Jun 4, 2025 • 352 • 17
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
RL with KL penalties is better viewed as Bayesian inference

Paper • 2205.11275 • Published May 23, 2022 • 1
Asymptotics of Language Model Alignment

Paper • 2404.01730 • Published Apr 2, 2024 • 1
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

Paper • 2505.19000 • Published May 25, 2025 • 42
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Paper • 2505.15778 • Published May 21, 2025 • 19
ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Paper • 2505.23762 • Published May 29, 2025 • 45
Table-R1: Inference-Time Scaling for Table Reasoning

Paper • 2505.23621 • Published May 29, 2025 • 93
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper • 2506.09250 • Published Jun 10, 2025 • 27
Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 67
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Paper • 2507.05687 • Published Jul 8, 2025 • 31
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14, 2025 • 90
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper • 2507.09477 • Published Jul 13, 2025 • 88
osmosis-ai/Osmosis-Apply-1.7B

Text Generation • 2B • Updated Jul 3, 2025 • 17 • 95
Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28, 2025 • 32
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 207
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50
Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published Oct 9, 2025 • 45
The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 33
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 41
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought

Paper • 2510.24941 • Published Oct 28, 2025 • 4