PotentialApplication
updated
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Paper
•
2505.14604
•
Published
•
23
AGENTIF: Benchmarking Instruction Following of Large Language Models in
Agentic Scenarios
Paper
•
2505.16944
•
Published
•
8
Training Step-Level Reasoning Verifiers with Formal Verification Tools
Paper
•
2505.15960
•
Published
•
7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Paper
•
2505.15134
•
Published
•
6
Paper
•
2505.14674
•
Published
•
37
General-Reasoner: Advancing LLM Reasoning Across All Domains
Paper
•
2505.14652
•
Published
•
24
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Paper
•
2505.13430
•
Published
•
11
Two Experts Are All You Need for Steering Thinking: Reinforcing
Cognitive Effort in MoE Reasoning Models Without Additional Training
Paper
•
2505.14681
•
Published
•
10
The Hallucination Tax of Reinforcement Finetuning
Paper
•
2505.13988
•
Published
•
8
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware
Representations
Paper
•
2505.18125
•
Published
•
112
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization
Paper
•
2505.18092
•
Published
•
43
Synthetic Data RL: Task Definition Is All You Need
Paper
•
2505.17063
•
Published
•
10
NOVER: Incentive Training for Language Models via Verifier-Free
Reinforcement Learning
Paper
•
2505.16022
•
Published
•
4
ARM: Adaptive Reasoning Model
Paper
•
2505.20258
•
Published
•
45
Learning to Reason without External Rewards
Paper
•
2505.19590
•
Published
•
29
Interleaved Reasoning for Large Language Models via Reinforcement
Learning
Paper
•
2505.19640
•
Published
•
14
Rethinking the Sampling Criteria in Reinforcement Learning for LLM
Reasoning: A Competence-Difficulty Alignment Perspective
Paper
•
2505.17652
•
Published
•
6
UFT: Unifying Supervised and Reinforcement Fine-Tuning
Paper
•
2505.16984
•
Published
•
3
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for
Frozen LLMs
Paper
•
2505.19075
•
Published
•
21
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit
Assignment
Paper
•
2505.11821
•
Published
•
14
Text2Grad: Reinforcement Learning from Natural Language Feedback
Paper
•
2505.22338
•
Published
•
8
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
•
2505.22617
•
Published
•
131
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique
Fine-Tuning on One Problem
Paper
•
2506.03295
•
Published
•
17
ConfQA: Answer Only If You Are Confident
Paper
•
2506.07309
•
Published
•
10
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form
Generation Tasks with Structured Checklists
Paper
•
2506.01241
•
Published
•
9
Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward
Paper
•
2506.05433
•
Published
•
4
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic
Sampling
Paper
•
2506.08672
•
Published
•
30
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
277
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length
for Efficient Reasoning
Paper
•
2504.21370
•
Published
•
2
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Paper
•
2507.01352
•
Published
•
56
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for
RLVR
Paper
•
2507.15778
•
Published
•
20
Sculptor: Empowering LLMs with Cognitive Agency via Active Context
Management
Paper
•
2508.04664
•
Published
•
13
Can LLM-Generated Textual Explanations Enhance Model Classification
Performance? An Empirical Study
Paper
•
2508.09776
•
Published
•
3
Aryabhata: An exam-focused language model for JEE Math
Paper
•
2508.08665
•
Published
•
16
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
•
2508.10751
•
Published
•
28
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm
Bridging Foundation Models and Lifelong Agentic Systems
Paper
•
2508.07407
•
Published
•
98
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
•
2508.08221
•
Published
•
50
Prompt Orchestration Markup Language
Paper
•
2508.13948
•
Published
•
48
If We May De-Presuppose: Robustly Verifying Claims through
Presupposition-Free Question Decomposition
Paper
•
2508.16838
•
Published
•
1
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
•
2508.16949
•
Published
•
23
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
•
2509.01055
•
Published
•
76
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies
Paper
•
2509.02563
•
Published
•
20
ΔL Normalization: Rethink Loss Aggregation in RLVR
Paper
•
2509.07558
•
Published
•
7
zELO: ELO-inspired Training Method for Rerankers and Embedding Models
Paper
•
2509.12541
•
Published
•
5
Evolving Language Models without Labels: Majority Drives Selection,
Novelty Promotes Variation
Paper
•
2509.15194
•
Published
•
33
EasySteer: A Unified Framework for High-Performance and Extensible LLM
Steering
Paper
•
2509.25175
•
Published
•
30
Scaling Generalist Data-Analytic Agents
Paper
•
2509.25084
•
Published
•
18
From Harm to Help: Turning Reasoning In-Context Demos into Assets for
Reasoning LMs
Paper
•
2509.23196
•
Published
•
9
SCI-Verifier: Scientific Verifier with Thinking
Paper
•
2509.24285
•
Published
•
9
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Paper
•
2510.14943
•
Published
•
39
LLM-guided Hierarchical Retrieval
Paper
•
2510.13217
•
Published
•
20
MemMamba: Rethinking Memory Patterns in State Space Model
Paper
•
2510.03279
•
Published
•
72
E^2Rank: Your Text Embedding can Also be an Effective
and Efficient Listwise Reranker
Paper
•
2510.22733
•
Published
•
31
Redefining Retrieval Evaluation in the Era of LLMs
Paper
•
2510.21440
•
Published
•
8
NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient
Fine-Tuning
Paper
•
2510.18940
•
Published
•
8