--- license: apache-2.0 datasets: - ylecun/mnist language: - en tags: - mnist - '784' - '256' - transformerlm - flow-matching - dit --- # 🧠🌊 TransformerLM (Flow 784, 256) β€” MNIST Training run artifacts from https://github.com/triloy8/transformerlm: a minimal flow-matching **DiT-style** image model trained on **MNIST** with a **fixed 784-token context** (28Γ—28 image values) and **conditional generation** using discrete labels plus a null label for classifier-free guidance (CFG). ## βœ… Key Facts - **Model type:** `image_dit` flow-matching Transformer - **Objective:** Flow matching - **Dataset:** MNIST (full 8-bit pixel values, 256 levels) - **Context length:** 784 values (28Γ—28 image) - **Layers:** 8 - **Heads:** 16 - **d_model:** 256 - **d_ff:** 1024 - **Training setup:** Single NVIDIA A40 (48GB) - **Runtime:** ~3 hours ⏱️ ## πŸ“¦ What’s Inside - 8k steps (full run), including: - Optimizer state - RNG state - Safetensors weights - Run config - Best checkpoint alias (`v007000`) - Latest checkpoint alias (`v008000`) ## πŸš€ Reproducibility Exact commit that launched the run: https://github.com/triloy8/transformerlm/commit/01459662f08e83abc997966415d648563860859e