view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 64
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29, 2025 • 213
PRLM/distilabel-intel-orca-dpo-pairs-balanced-subsets-translated Viewer • Updated May 6, 2025 • 8k • 5
PRLM/distilabel-intel-orca-dpo-pairs-balanced-subsets-translated Viewer • Updated May 6, 2025 • 8k • 5
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model Mar 10, 2025 • 146
Running on CPU Upgrade Featured 1k Model Memory Utility 🚀 1k Calculate VRAM needed to train or run Hugging Face models
Running 3.69k The Ultra-Scale Playbook 🌌 3.69k The ultimate guide to training LLM on large GPU Clusters
SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks Paper • 2412.13053 • Published Dec 17, 2024