S2-Pro NF4

S2-Pro overview

GitHub Fork | Upstream Fish Speech | Technical Report | Fish Audio

GitHub stars GitHub repo Upstream

This repository hosts the Groxaxo NF4 release of Fish Audio S2-Pro for lower-VRAM inference.

  • Base model: Fish Audio S2-Pro
  • Relation: Quantized release
  • Format: bitsandbytes NF4 prequantized model.pth
  • Target hardware: practical single-GPU inference on 12 GB+ VRAM setups
  • Best paired with: groxaxo/fish-speech-int4-patch

This is a community-hosted release of the original Fish Audio model. Credit for the base model, research, and architecture belongs to the Fish Audio team.

Huge thanks to the original creators at Fish Audio and the upstream fishaudio/fish-speech project for building and open-sourcing S2-Pro.

If this NF4 release helps you, please star the companion GitHub project here:

https://github.com/groxaxo/fish-speech-int4-patch

The goal is simple: make the flagship S2-Pro experience easier to run, easier to share, and easier to deploy on real-world single-GPU machines.

What is in this repo

  • model.pth: prequantized NF4 checkpoint
  • codec.pth: codec weights
  • tokenizer/config assets needed by the patched loader

The checkpoint is meant to be loaded through the fork's bnb4 path. It is not a legacy int4 or int8 export.

Recommended usage

Use the patched repo that defaults to the right settings for this checkpoint:

git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch

./install_bnb4_3060.sh
./start_bnb4_3060.sh

That path starts the API/WebUI with the intended defaults:

  • --bnb4
  • --half
  • lazy loading
  • s2-pro as the canonical model name

Why people use this release

  • lower-VRAM NF4 deployment path for S2-Pro
  • companion GitHub fork with API, WebUI, Docker, and export tooling
  • smoke-tested prequantized model.pth reload support
  • clearer self-hosting path for 12 GB and 24 GB cards

Quick commands

WebUI

git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch
./install_bnb4_3060.sh
./start_bnb4_3060.sh

API server

PYTHONPATH=. python tools/api_server.py \
  --checkpoint-path /path/to/s2-pro \
  --bnb4 \
  --half \
  --host 0.0.0.0 \
  --port 8880

OpenAI-style request

curl http://127.0.0.1:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "s2-pro",
    "input": "[warm, calm] Hello from the Groxaxo NF4 S2-Pro release.",
    "voice": "default"
  }' \
  --output speech.wav

Manual loading

If you want to point the repo at this checkpoint directly, keep --bnb4 enabled:

PYTHONPATH=. python tools/api_server.py \
  --checkpoint-path /path/to/s2-pro \
  --bnb4 \
  --half

Or in Python:

import torch
from fish_speech.models.text2semantic.inference import init_model

model, decode_one_token = init_model(
    checkpoint_path="/path/to/s2-pro",
    device="cuda:0",
    precision=torch.float16,
    compile=False,
    bnb4=True,
)

Why this release exists

Upstream S2-Pro is excellent, but many single-card workstations do not have enough VRAM for a comfortable default setup. This NF4 release makes S2-Pro much easier to run on common cards like the RTX 3060 while preserving the flagship model path.

Model notes

  • S2-Pro uses a Dual-Autoregressive architecture with a 4B slow AR stack and a fast residual AR stack.
  • It supports fine-grained inline control with natural-language tags such as [whisper], [laugh], and [sad].
  • It supports multilingual generation, multi-speaker prompting, and strong voice cloning workflows.

Prompt examples

[whisper] We need to leave quietly before sunrise.
[excited] We actually got it working on a 12 GB card.
[sad] I waited for you at the station all night.

Links

License

This model remains under the Fish Audio Research License. Research and non-commercial use is permitted under that license. Commercial use requires a separate agreement with Fish Audio.


Downloads last month
238
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for groxaxo/s2-pro-BnB-4Bits

Base model

fishaudio/s2-pro
Quantized
(4)
this model

Paper for groxaxo/s2-pro-BnB-4Bits