Voxtral Mini 4B Realtime 4bit (float16)

This is a 4-bit quantized, float16-base MLX conversion of mistralai/Voxtral-Mini-4B-Realtime-2602.

Which variant should you pick?

Chip	Recommended	Why
M1 / M2	This repo (`-4bit-fp16`)	Metal on M1/M2 has no native `bfloat16` ALU; bf16 ops fall back to a slower path. `float16` stays on the fast GPU path.
M3 / M4+	`iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit` (bf16)	bf16 is natively supported and gives the same speed as fp16 with a wider dynamic range (slightly safer numerics).

Only the non-quantized weights differ between the two repos (norms, biases, scales, some embeddings). The quantized mat-mul weights are bit-identical. Transcription output is byte-identical on a 20 s French clip at temperature 0 (verified locally).

Conversion

Source model:

mistralai/Voxtral-Mini-4B-Realtime-2602

Local conversion command:

python -m mlx_audio.convert \
  --hf-path mistralai/Voxtral-Mini-4B-Realtime-2602 \
  --mlx-path /path/to/Voxtral-Mini-4B-Realtime-2602-4bit-fp16 \
  --quantize \
  --q-group-size 64 \
  --q-bits 4 \
  --dtype float16 \
  --model-domain stt

Quantization config:

bits: 4
group size: 64
mode: affine
non-quant dtype: float16

Files

Only the MLX runtime artifacts needed for inference:

model.safetensors
model.safetensors.index.json
config.json
generation_config.json
params.json
processor_config.json
tekken.json

Usage

pip install "mlx-audio[stt]"

from mlx_audio.stt.utils import load_model

model = load_model("iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit-fp16")
result = model.generate("audio.wav")
print(result.text)

Notes

Base model license remains Apache 2.0.
On M3/M4, prefer the -4bit (bf16) repo; there is no speed benefit to fp16 there and bf16's wider exponent range is slightly more robust.
Transcription quality was verified identical to the bf16 variant at temperature=0 on a 20 s French parliamentary audio clip.

Downloads last month: 86

Safetensors

Model size

0.7B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Model tree for iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit-fp16

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-Mini-4B-Realtime-2602

Quantized

(22)

this model