view article Article Distribution Matching Prevents Mode Collapse in Training Reasoning Models 4 days ago • 2
Running on CPU Upgrade Featured 3.05k The Smol Training Playbook 📚 3.05k The secrets to building world-class LLMs
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30, 2025 • 51
Running 3.75k The Ultra-Scale Playbook 🌌 3.75k The ultimate guide to training LLM on large GPU Clusters