UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience Paper • 2603.24533 • Published about 19 hours ago • 25
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published Feb 12 • 93
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published Feb 12 • 60
ProAct: Agentic Lookahead in Interactive Environments Paper • 2602.05327 • Published Feb 5 • 26
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control Paper • 2511.15248 • Published Nov 19, 2025 • 7
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 28