🧠🌊 TransformerLM (Flow 784, 256) β€” MNIST

Training run artifacts from https://github.com/triloy8/transformerlm: a minimal flow-matching DiT-style image model trained on MNIST with a fixed 784-token context (28Γ—28 image values) and conditional generation using discrete labels plus a null label for classifier-free guidance (CFG).

βœ… Key Facts

  • Model type: image_dit flow-matching Transformer
  • Objective: Flow matching
  • Dataset: MNIST (full 8-bit pixel values, 256 levels)
  • Context length: 784 values (28Γ—28 image)
  • Layers: 8
  • Heads: 16
  • d_model: 256
  • d_ff: 1024
  • Training setup: Single NVIDIA A40 (48GB)
  • Runtime: ~3 hours ⏱️

πŸ“¦ What’s Inside

  • 8k steps (full run), including:
    • Optimizer state
    • RNG state
    • Safetensors weights
  • Run config
  • Best checkpoint alias (v007000)
  • Latest checkpoint alias (v008000)

πŸš€ Reproducibility

Exact commit that launched the run: https://github.com/triloy8/transformerlm/commit/01459662f08e83abc997966415d648563860859e

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train trixyL/transformerlm-flow-mnist