S2-Pro NF4

GitHub Fork | Upstream Fish Speech | Technical Report | Fish Audio

This repository hosts the Groxaxo NF4 release of Fish Audio S2-Pro for lower-VRAM inference.

Base model: Fish Audio S2-Pro
Relation: Quantized release
Format: bitsandbytes NF4 prequantized model.pth
Target hardware: practical single-GPU inference on 12 GB+ VRAM setups
Best paired with: groxaxo/fish-speech-int4-patch

This is a community-hosted release of the original Fish Audio model. Credit for the base model, research, and architecture belongs to the Fish Audio team.

Huge thanks to the original creators at Fish Audio and the upstream fishaudio/fish-speech project for building and open-sourcing S2-Pro.

If this NF4 release helps you, please star the companion GitHub project here:

https://github.com/groxaxo/fish-speech-int4-patch

The goal is simple: make the flagship S2-Pro experience easier to run, easier to share, and easier to deploy on real-world single-GPU machines.

What is in this repo

model.pth: prequantized NF4 checkpoint
codec.pth: codec weights
tokenizer/config assets needed by the patched loader

The checkpoint is meant to be loaded through the fork's bnb4 path. It is not a legacy int4 or int8 export.

Recommended usage

Use the patched repo that defaults to the right settings for this checkpoint:

git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch

./install_bnb4_3060.sh
./start_bnb4_3060.sh

That path starts the API/WebUI with the intended defaults:

--bnb4
--half
lazy loading
s2-pro as the canonical model name

Why people use this release

lower-VRAM NF4 deployment path for S2-Pro
companion GitHub fork with API, WebUI, Docker, and export tooling
smoke-tested prequantized model.pth reload support
clearer self-hosting path for 12 GB and 24 GB cards

Quick commands

WebUI

git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch
./install_bnb4_3060.sh
./start_bnb4_3060.sh

API server

PYTHONPATH=. python tools/api_server.py \
  --checkpoint-path /path/to/s2-pro \
  --bnb4 \
  --half \
  --host 0.0.0.0 \
  --port 8880

OpenAI-style request

curl http://127.0.0.1:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "s2-pro",
    "input": "[warm, calm] Hello from the Groxaxo NF4 S2-Pro release.",
    "voice": "default"
  }' \
  --output speech.wav

Manual loading

If you want to point the repo at this checkpoint directly, keep --bnb4 enabled:

PYTHONPATH=. python tools/api_server.py \
  --checkpoint-path /path/to/s2-pro \
  --bnb4 \
  --half

Or in Python:

import torch
from fish_speech.models.text2semantic.inference import init_model

model, decode_one_token = init_model(
    checkpoint_path="/path/to/s2-pro",
    device="cuda:0",
    precision=torch.float16,
    compile=False,
    bnb4=True,
)

Why this release exists

Upstream S2-Pro is excellent, but many single-card workstations do not have enough VRAM for a comfortable default setup. This NF4 release makes S2-Pro much easier to run on common cards like the RTX 3060 while preserving the flagship model path.

Model notes

S2-Pro uses a Dual-Autoregressive architecture with a 4B slow AR stack and a fast residual AR stack.
It supports fine-grained inline control with natural-language tags such as [whisper], [laugh], and [sad].
It supports multilingual generation, multi-speaker prompting, and strong voice cloning workflows.

Prompt examples

[whisper] We need to leave quietly before sunrise.
[excited] We actually got it working on a 12 GB card.
[sad] I waited for you at the station all night.

License

This model remains under the Fish Audio Research License. Research and non-commercial use is permitted under that license. Commercial use requires a separate agreement with Fish Audio.

Downloads last month: 238

Model tree for groxaxo/s2-pro-BnB-4Bits

Base model

fishaudio/s2-pro

Quantized

(4)

this model

Paper for groxaxo/s2-pro-BnB-4Bits

Fish Audio S2 Technical Report

Paper • 2603.08823 • Published 23 days ago • 36

groxaxo
/

s2-pro-BnB-4Bits