π§ π TransformerLM (Flow 784, 256) β MNIST
Training run artifacts from https://github.com/triloy8/transformerlm: a minimal flow-matching DiT-style image model trained on MNIST with a fixed 784-token context (28Γ28 image values) and conditional generation using discrete labels plus a null label for classifier-free guidance (CFG).
β Key Facts
- Model type:
image_ditflow-matching Transformer - Objective: Flow matching
- Dataset: MNIST (full 8-bit pixel values, 256 levels)
- Context length: 784 values (28Γ28 image)
- Layers: 8
- Heads: 16
- d_model: 256
- d_ff: 1024
- Training setup: Single NVIDIA A40 (48GB)
- Runtime: ~3 hours β±οΈ
π¦ Whatβs Inside
- 8k steps (full run), including:
- Optimizer state
- RNG state
- Safetensors weights
- Run config
- Best checkpoint alias (
v007000) - Latest checkpoint alias (
v008000)
π Reproducibility
Exact commit that launched the run: https://github.com/triloy8/transformerlm/commit/01459662f08e83abc997966415d648563860859e
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support