Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation Paper • 2602.03806 • Published 5 days ago • 4
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents Paper • 2601.18217 • Published 14 days ago • 11
GUI-Drag Collection Beyond Clicking: A step towards generalist grounding via text dragging • 4 items • Updated 21 days ago
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published Oct 28, 2025 • 29
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
Simulating Environments with Reasoning Models for Agent Training Paper • 2511.01824 • Published Nov 3, 2025 • 2