S2-Pro NF4
GitHub Fork | Upstream Fish Speech | Technical Report | Fish Audio
This repository hosts the Groxaxo NF4 release of Fish Audio S2-Pro for lower-VRAM inference.
- Base model: Fish Audio S2-Pro
- Relation: Quantized release
- Format: bitsandbytes NF4 prequantized
model.pth - Target hardware: practical single-GPU inference on 12 GB+ VRAM setups
- Best paired with:
groxaxo/fish-speech-int4-patch
This is a community-hosted release of the original Fish Audio model. Credit for the base model, research, and architecture belongs to the Fish Audio team.
Huge thanks to the original creators at Fish Audio and the upstream fishaudio/fish-speech project for building and open-sourcing S2-Pro.
If this NF4 release helps you, please star the companion GitHub project here:
https://github.com/groxaxo/fish-speech-int4-patch
The goal is simple: make the flagship S2-Pro experience easier to run, easier to share, and easier to deploy on real-world single-GPU machines.
What is in this repo
model.pth: prequantized NF4 checkpointcodec.pth: codec weights- tokenizer/config assets needed by the patched loader
The checkpoint is meant to be loaded through the fork's bnb4 path. It is not a legacy int4 or int8 export.
Recommended usage
Use the patched repo that defaults to the right settings for this checkpoint:
git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch
./install_bnb4_3060.sh
./start_bnb4_3060.sh
That path starts the API/WebUI with the intended defaults:
--bnb4--half- lazy loading
s2-proas the canonical model name
Why people use this release
- lower-VRAM NF4 deployment path for S2-Pro
- companion GitHub fork with API, WebUI, Docker, and export tooling
- smoke-tested prequantized
model.pthreload support - clearer self-hosting path for 12 GB and 24 GB cards
Quick commands
WebUI
git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch
./install_bnb4_3060.sh
./start_bnb4_3060.sh
API server
PYTHONPATH=. python tools/api_server.py \
--checkpoint-path /path/to/s2-pro \
--bnb4 \
--half \
--host 0.0.0.0 \
--port 8880
OpenAI-style request
curl http://127.0.0.1:8880/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "s2-pro",
"input": "[warm, calm] Hello from the Groxaxo NF4 S2-Pro release.",
"voice": "default"
}' \
--output speech.wav
Manual loading
If you want to point the repo at this checkpoint directly, keep --bnb4 enabled:
PYTHONPATH=. python tools/api_server.py \
--checkpoint-path /path/to/s2-pro \
--bnb4 \
--half
Or in Python:
import torch
from fish_speech.models.text2semantic.inference import init_model
model, decode_one_token = init_model(
checkpoint_path="/path/to/s2-pro",
device="cuda:0",
precision=torch.float16,
compile=False,
bnb4=True,
)
Why this release exists
Upstream S2-Pro is excellent, but many single-card workstations do not have enough VRAM for a comfortable default setup. This NF4 release makes S2-Pro much easier to run on common cards like the RTX 3060 while preserving the flagship model path.
Model notes
- S2-Pro uses a Dual-Autoregressive architecture with a 4B slow AR stack and a fast residual AR stack.
- It supports fine-grained inline control with natural-language tags such as
[whisper],[laugh], and[sad]. - It supports multilingual generation, multi-speaker prompting, and strong voice cloning workflows.
Prompt examples
[whisper] We need to leave quietly before sunrise.
[excited] We actually got it working on a 12 GB card.
[sad] I waited for you at the station all night.
Links
- Groxaxo fork README
- Star the GitHub project
- GitHub issues and feature requests
- Fish Audio blog post
- Fish Audio S2 technical report
License
This model remains under the Fish Audio Research License. Research and non-commercial use is permitted under that license. Commercial use requires a separate agreement with Fish Audio.
- Downloads last month
- 238
Model tree for groxaxo/s2-pro-BnB-4Bits
Base model
fishaudio/s2-pro