Voxtral Mini 4B Realtime 4bit (float16)

This is a 4-bit quantized, float16-base MLX conversion of mistralai/Voxtral-Mini-4B-Realtime-2602.

Which variant should you pick?

Chip Recommended Why
M1 / M2 This repo (-4bit-fp16) Metal on M1/M2 has no native bfloat16 ALU; bf16 ops fall back to a slower path. float16 stays on the fast GPU path.
M3 / M4+ iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit (bf16) bf16 is natively supported and gives the same speed as fp16 with a wider dynamic range (slightly safer numerics).

Only the non-quantized weights differ between the two repos (norms, biases, scales, some embeddings). The quantized mat-mul weights are bit-identical. Transcription output is byte-identical on a 20 s French clip at temperature 0 (verified locally).

Conversion

Source model:

  • mistralai/Voxtral-Mini-4B-Realtime-2602

Local conversion command:

python -m mlx_audio.convert \
  --hf-path mistralai/Voxtral-Mini-4B-Realtime-2602 \
  --mlx-path /path/to/Voxtral-Mini-4B-Realtime-2602-4bit-fp16 \
  --quantize \
  --q-group-size 64 \
  --q-bits 4 \
  --dtype float16 \
  --model-domain stt

Quantization config:

  • bits: 4
  • group size: 64
  • mode: affine
  • non-quant dtype: float16

Files

Only the MLX runtime artifacts needed for inference:

  • model.safetensors
  • model.safetensors.index.json
  • config.json
  • generation_config.json
  • params.json
  • processor_config.json
  • tekken.json

Usage

pip install "mlx-audio[stt]"
from mlx_audio.stt.utils import load_model

model = load_model("iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit-fp16")
result = model.generate("audio.wav")
print(result.text)

Notes

  • Base model license remains Apache 2.0.
  • On M3/M4, prefer the -4bit (bf16) repo; there is no speed benefit to fp16 there and bf16's wider exponent range is slightly more robust.
  • Transcription quality was verified identical to the bf16 variant at temperature=0 on a 20 s French parliamentary audio clip.
Downloads last month
86
Safetensors
Model size
0.7B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit-fp16