TinyMemoryLM

⚠️ IMPORTANT NOTICE

The infrence script is not publicly available yet (soon!). This release contains only the model weights and tokenizer.

The model is really dumb. This is a ~1M parameter research model designed for experimentation, not production use.

Do not expect it to answer any questions. It is prone to repetition, hallucination, and format collapse.

Overview

TinyMemoryLM is an ultra-lightweight language model optimized for edge cases and architectural experimentation. Despite its small footprint, it incorporates several novel training innovations aimed at stabilizing tiny model convergence, including hybrid tokenization, loss boosting strategies, and context-aware relevance modeling.

This release includes both Pretrained Weights (base language modeling) and Instruction Weights (fine-tuned for chat/completion).

Files Provided

File	Description
`tokenizer.json`	Hybrid word/character tokenizer vocabulary.
`pretrain.pt`	Base pretrained checkpoint (language modeling).
`model.pt`	Instruction-tuned checkpoint (SFT/Chat).

Model Specifications

Parameter	Value
Architecture	Transformer Decoder
Parameters	~1 Million
Context Length	2,048 tokens
Dimensions	`d_model=160`, `layers=6`, `heads=4`, `ffn=256`
Vocabulary	~2,111 tokens (Hybrid Char + Word)
Normalization	RMSNorm + QK-Norm
Embeddings	Rotary Embeddings (RoPE)
Activation	SwiGLU

Architecture Highlights

TinyMemoryLM implements several research-focused modifications to standard transformer architectures:

Hybrid Tokenizer: Combines character-level fallback with frequent word tokens to balance compression and vocabulary size.
QK-Norm: Applies RMSNorm to Query and Key projections for improved stability in low-precision training.
Word Token Loss Boosting: Upweights loss signals for multi-character tokens to prevent the model from ignoring them in favor of character-level spelling.
Response-Start Weighting: Prioritizes the first tokens of assistant responses to improve prompt conditioning.
Pretrain Replay: Mixes pretraining data during instruction tuning to prevent catastrophic forgetting of language fluency.

Training Loss Curve

Below is the training loss progression during the instruction tuning phase. Note the stability measures taken to prevent collapse in such a small parameter regime.

##Limitations & Expectations

Please manage your expectations when using TinyMemoryLM:

Repetition: Tiny models are prone to collapsing into repetitive token loops.
Knowledge: The model has limited world knowledge due to parameter constraints.
Usage: This model is intended for research, educational purposes, and architectural benchmarking. It is not suitable for assistant tasks or reliable information retrieval.

Generated for research purposes. Use responsibly.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

CompactAI
/

TMLM-Haiku-1