Automatic Speech Recognition
MLX
Safetensors
English
asr
speech-recognition
apple-silicon
parakeet
tdt
sonic-speech
quantized
int4
Instructions to use sonic-speech/parakeet-tdt-0.6b-v2-int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sonic-speech/parakeet-tdt-0.6b-v2-int4 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir parakeet-tdt-0.6b-v2-int4 sonic-speech/parakeet-tdt-0.6b-v2-int4
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Parakeet TDT 0.6B V2 (MLX, Encoder INT4)
Mixed-precision variant: encoder quantized to INT4, decoder + joint network remain BF16. Optimized for Apple Silicon with reduced memory footprint.
| Metric | BF16 Reference | This (INT4) |
|---|---|---|
| Peak memory | ~3GB | ~1.0GB |
| Weight size | ~1.2GB | ~330MB |
| WER impact | baseline | +0.3-0.5% expected |
Usage
from parakeet import from_pretrained
model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2-int4")
result = model.transcribe("audio.wav")
Quantization Details
- Encoder: INT4, group_size=64
- Decoder (PredictNetwork): BF16 (accuracy-critical)
- Joint network: BF16 (accuracy-critical)
- Method: Post-training quantization via
mlx.nn.quantize - Source:
sonic-speech/parakeet-tdt-0.6b-v2
- Downloads last month
- 21
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for sonic-speech/parakeet-tdt-0.6b-v2-int4
Base model
nvidia/parakeet-tdt-0.6b-v2 Finetuned
sonic-speech/parakeet-tdt-0.6b-v2