Higgs-Audio-v3-TTS-4bit-NVFP4

4-bit NVFP4 artifact for bosonai/higgs-audio-v3-tts-4b.

Scope

This is a transformer-body quantized artifact, not a complete drop-in runtime yet. It quantizes body.layers.* attention/MLP 2D weights and preserves the Higgs audio tokenizer/vocoder, fused modality embedding/head, norms, biases, and non-2D tensors.

Higgs Audio v3 TTS uses a custom HiggsMultimodalQwen3ForConditionalGeneration architecture with 8 audio codebooks, delayed multi-codebook generation, and waveform decode. Current vanilla Transformers in the tested environment does not instantiate this architecture, so runtime integration must be done through SGLang-Omni or a custom loader.

Quantization Report

  • Quantized tensors: 252
  • Quantized parameter fraction seen: 0.7805
  • Mean relative L2: 0.097363
  • Max relative L2: 0.101345
  • Max absolute error: 0.140625

See:

  • quantization_config.json
  • quant_error_report.json
  • tensor_manifest.json

Persian TTS Runtime Note

Persian generation testing through SGLang-Omni showed good spoken audio quality for the NVFP4 weights after dequantized runtime validation. The raw generation can append a long near-silent tail after the spoken utterance; the accompanying Space/server wrapper therefore enables trailing-silence trimming by default.

See:

  • nvfp4_persian_tts_trim_report.json
  • samples/nvfp4_dequant_fa_trimmed.wav

License

Released under the upstream Boson Higgs Audio v3 research and non-commercial license. Production, hosted APIs, or revenue-generating use requires a separate commercial license from Boson AI.

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reza2kn/Higgs-Audio-v3-TTS-4bit-NVFP4

Quantized
(4)
this model