🧠🌊 TransformerLM (Flow 784, 256) — MNIST

Training run artifacts from https://github.com/triloy8/transformerlm: a minimal flow-matching DiT-style image model trained on MNIST with a fixed 784-token context (28×28 image values) and conditional generation using discrete labels plus a null label for classifier-free guidance (CFG).

✅ Key Facts

Model type: image_dit flow-matching Transformer
Objective: Flow matching
Dataset: MNIST (full 8-bit pixel values, 256 levels)
Context length: 784 values (28×28 image)
Layers: 8
Heads: 16
d_model: 256
d_ff: 1024
Training setup: Single NVIDIA A40 (48GB)
Runtime: ~3 hours ⏱️

📦 What’s Inside

8k steps (full run), including:
- Optimizer state
- RNG state
- Safetensors weights
Run config
Best checkpoint alias (v007000)
Latest checkpoint alias (v008000)

🚀 Reproducibility

Exact commit that launched the run: https://github.com/triloy8/transformerlm/commit/01459662f08e83abc997966415d648563860859e

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

trixyL
/

transformerlm-flow-mnist

🧠🌊 TransformerLM (Flow 784, 256) — MNIST

✅ Key Facts

📦 What’s Inside

🚀 Reproducibility

Dataset used to train trixyL/transformerlm-flow-mnist