Higgs-Audio-v3-TTS-4bit-MLX

4-bit MLX 4-bit artifact for bosonai/higgs-audio-v3-tts-4b.

Scope

This is a transformer-body quantized artifact, not a complete drop-in runtime yet. It quantizes body.layers.* attention/MLP 2D weights and preserves the Higgs audio tokenizer/vocoder, fused modality embedding/head, norms, biases, and non-2D tensors.

Higgs Audio v3 TTS uses a custom HiggsMultimodalQwen3ForConditionalGeneration architecture with 8 audio codebooks, delayed multi-codebook generation, and waveform decode. Current vanilla Transformers in the tested environment does not instantiate this architecture, so runtime integration must be done through SGLang-Omni or a custom loader.

Quantization Report

  • Quantized tensors: 252
  • Quantized parameter fraction seen: 0.7805
  • Mean relative L2: 0.066956
  • Max relative L2: 0.103319
  • Max absolute error: 0.075130

See:

  • quantization_config.json
  • quant_error_report.json
  • tensor_manifest.json

License

Released under the upstream Boson Higgs Audio v3 research and non-commercial license. Production, hosted APIs, or revenue-generating use requires a separate commercial license from Boson AI.

Downloads last month
39
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Reza2kn/Higgs-Audio-v3-TTS-4bit-MLX

Quantized
(4)
this model