GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 2 days ago • 115
Self-Correcting Delta Transformer Collection Self-Correcting Delta Transformer - DDL provides the Hardware mechanism (The Erazor), NL solves the software problem. • 3 items • Updated 5 days ago • 2
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published 11 days ago • 125
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published 11 days ago • 106
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 10 days ago • 33
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 11 days ago • 55
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published Nov 20, 2025 • 108
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26, 2025 • 114
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 248
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 86
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 226
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance Paper • 2511.12997 • Published Nov 17, 2025 • 10