Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation Paper • 2601.11258 • Published 12 days ago • 5
One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment Paper • 2601.18731 • Published 1 day ago • 6
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents Paper • 2601.18217 • Published 2 days ago • 8
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published 2 days ago • 13
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Paper • 2601.18778 • Published 1 day ago • 25
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published about 15 hours ago • 16
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents Paper • 2601.16746 • Published 5 days ago • 74
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 7 days ago • 66
Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs Paper • 2601.11061 • Published 12 days ago • 7
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment Paper • 2601.14249 • Published 8 days ago • 8
A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification Paper • 2601.13288 • Published 9 days ago • 12
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models Paper • 2601.14152 • Published 8 days ago • 4
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences Paper • 2601.06789 • Published 17 days ago • 77
Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published 16 days ago • 112
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests Paper • 2601.06953 • Published 17 days ago • 43