Higgs-Audio-v3-TTS-4bit-NVFP4

4-bit NVFP4 artifact for bosonai/higgs-audio-v3-tts-4b.

Scope

This is a transformer-body quantized artifact, not a complete drop-in runtime yet. It quantizes body.layers.* attention/MLP 2D weights and preserves the Higgs audio tokenizer/vocoder, fused modality embedding/head, norms, biases, and non-2D tensors.

Higgs Audio v3 TTS uses a custom HiggsMultimodalQwen3ForConditionalGeneration architecture with 8 audio codebooks, delayed multi-codebook generation, and waveform decode. Current vanilla Transformers in the tested environment does not instantiate this architecture, so runtime integration must be done through SGLang-Omni or a custom loader.

Quantization Report

Quantized tensors: 252
Quantized parameter fraction seen: 0.7805
Mean relative L2: 0.097363
Max relative L2: 0.101345
Max absolute error: 0.140625

See:

quantization_config.json
quant_error_report.json
tensor_manifest.json

Persian TTS Runtime Note

Persian generation testing through SGLang-Omni showed good spoken audio quality for the NVFP4 weights after dequantized runtime validation. The raw generation can append a long near-silent tail after the spoken utterance; the accompanying Space/server wrapper therefore enables trailing-silence trimming by default.

See:

nvfp4_persian_tts_trim_report.json
samples/nvfp4_dequant_fa_trimmed.wav

License

Released under the upstream Boson Higgs Audio v3 research and non-commercial license. Production, hosted APIs, or revenue-generating use requires a separate commercial license from Boson AI.

Downloads last month: 31

Model tree for Reza2kn/Higgs-Audio-v3-TTS-4bit-NVFP4

Base model

bosonai/higgs-audio-v3-tts-4b

Quantized

(4)

this model