Bonsai-Auxiliary
Collection
7 items โข Updated โข 14
FP16 safetensors (HuggingFace format) of the ternary Bonsai-8B model. This repo exists for users who want to run Ternary Bonsai with stock HuggingFace tooling or frameworks that don't yet support any of the packed ternary format. The MLX 2-bit format is currently the only packed format available; more formats for other backends are coming soon.
We strongly recommend using the natively packed models instead. The packed format is where all the benefits of Bonsai come from โ up to 9x memory reduction, 5x faster inference, and lower energy per token. This unpacked FP16 version is full-size and does not provide any of those advantages.
For the optimized ternary release model (recommended):