Higgs-Audio-v3-TTS-4bit-MLX

4-bit MLX 4-bit artifact for bosonai/higgs-audio-v3-tts-4b.

Scope

This is a transformer-body quantized artifact, not a complete drop-in runtime yet. It quantizes body.layers.* attention/MLP 2D weights and preserves the Higgs audio tokenizer/vocoder, fused modality embedding/head, norms, biases, and non-2D tensors.

Higgs Audio v3 TTS uses a custom HiggsMultimodalQwen3ForConditionalGeneration architecture with 8 audio codebooks, delayed multi-codebook generation, and waveform decode. Current vanilla Transformers in the tested environment does not instantiate this architecture, so runtime integration must be done through SGLang-Omni or a custom loader.

Quantization Report

Quantized tensors: 252
Quantized parameter fraction seen: 0.7805
Mean relative L2: 0.066956
Max relative L2: 0.103319
Max absolute error: 0.075130

See:

quantization_config.json
quant_error_report.json
tensor_manifest.json

License

Released under the upstream Boson Higgs Audio v3 research and non-commercial license. Production, hosted APIs, or revenue-generating use requires a separate commercial license from Boson AI.

Downloads last month: 39

MLX

Hardware compatibility

4-bit

Model tree for Reza2kn/Higgs-Audio-v3-TTS-4bit-MLX

Base model

bosonai/higgs-audio-v3-tts-4b

Quantized

(4)

this model