---
license: apache-2.0
datasets:
- ylecun/mnist
language:
- en
tags:
- mnist
- '784'
- '256'
- transformerlm
- flow-matching
- dit
---
# 🧠🌊 TransformerLM (Flow 784, 256) — MNIST

Training run artifacts from https://github.com/triloy8/transformerlm: a minimal flow-matching **DiT-style** image model trained on **MNIST** with a **fixed 784-token context** (28×28 image values) and **conditional generation** using discrete labels plus a null label for classifier-free guidance (CFG).

## ✅ Key Facts

- **Model type:** `image_dit` flow-matching Transformer
- **Objective:** Flow matching
- **Dataset:** MNIST (full 8-bit pixel values, 256 levels)
- **Context length:** 784 values (28×28 image)
- **Layers:** 8
- **Heads:** 16
- **d_model:** 256
- **d_ff:** 1024
- **Training setup:** Single NVIDIA A40 (48GB)
- **Runtime:** ~3 hours ⏱️

## 📦 What’s Inside

- 8k steps (full run), including:
  - Optimizer state
  - RNG state
  - Safetensors weights
- Run config
- Best checkpoint alias (`v007000`)
- Latest checkpoint alias (`v008000`)

## 🚀 Reproducibility

Exact commit that launched the run:
https://github.com/triloy8/transformerlm/commit/01459662f08e83abc997966415d648563860859e