Voxtral Mini 4B Realtime 4bit (float16)
This is a 4-bit quantized, float16-base MLX conversion of mistralai/Voxtral-Mini-4B-Realtime-2602.
Which variant should you pick?
| Chip | Recommended | Why |
|---|---|---|
| M1 / M2 | This repo (-4bit-fp16) |
Metal on M1/M2 has no native bfloat16 ALU; bf16 ops fall back to a slower path. float16 stays on the fast GPU path. |
| M3 / M4+ | iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit (bf16) |
bf16 is natively supported and gives the same speed as fp16 with a wider dynamic range (slightly safer numerics). |
Only the non-quantized weights differ between the two repos (norms, biases, scales, some embeddings). The quantized mat-mul weights are bit-identical. Transcription output is byte-identical on a 20 s French clip at temperature 0 (verified locally).
Conversion
Source model:
mistralai/Voxtral-Mini-4B-Realtime-2602
Local conversion command:
python -m mlx_audio.convert \
--hf-path mistralai/Voxtral-Mini-4B-Realtime-2602 \
--mlx-path /path/to/Voxtral-Mini-4B-Realtime-2602-4bit-fp16 \
--quantize \
--q-group-size 64 \
--q-bits 4 \
--dtype float16 \
--model-domain stt
Quantization config:
- bits:
4 - group size:
64 - mode:
affine - non-quant dtype:
float16
Files
Only the MLX runtime artifacts needed for inference:
model.safetensorsmodel.safetensors.index.jsonconfig.jsongeneration_config.jsonparams.jsonprocessor_config.jsontekken.json
Usage
pip install "mlx-audio[stt]"
from mlx_audio.stt.utils import load_model
model = load_model("iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit-fp16")
result = model.generate("audio.wav")
print(result.text)
Notes
- Base model license remains Apache 2.0.
- On M3/M4, prefer the
-4bit(bf16) repo; there is no speed benefit to fp16 there and bf16's wider exponent range is slightly more robust. - Transcription quality was verified identical to the bf16 variant at
temperature=0on a 20 s French parliamentary audio clip.
- Downloads last month
- 86
Model size
0.7B params
Tensor type
F16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for iris-sfg/Voxtral-Mini-4B-Realtime-2602-4bit-fp16
Base model
mistralai/Ministral-3-3B-Base-2512 Finetuned
mistralai/Voxtral-Mini-4B-Realtime-2602