foryoung365
/

Fun-ASR-Nano-2512-int4-onnx

Automatic Speech Recognition

Model card Files Files and versions

Fun-ASR-Nano-2512 INT4 ONNX for sherpa-onnx

This repository contains a locally quantized INT4 ONNX variant of FunAudioLLM/Fun-ASR-Nano-2512, prepared for sherpa-onnx offline inference.

Important Notes

This is not an official release from FunAudioLLM, ModelScope, or k2-fsa.
The INT4 weights were generated locally from the fp32 ONNX package distributed in the k2-fsa/sherpa-onnx ASR model release assets.
The original upstream model card currently does not declare clear Hugging Face YAML license metadata. Please verify the upstream usage terms before any redistribution or commercial use.

Source and Lineage

Upstream model: FunAudioLLM/Fun-ASR-Nano-2512
- https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512
ONNX export lineage referenced by upstream sherpa-onnx package:
- https://www.modelscope.cn/models/zengshuishui/FunASR-nano-onnx/files
- https://github.com/Wasser1462/FunASR-nano-onnx
fp32 package used as quantization source:
- sherpa-onnx-funasr-nano-2025-12-30.tar.bz2
- from https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models

Files

encoder_adaptor.int4.onnx
embedding.int4.onnx
llm.int4.onnx
Qwen3-0.6B/
- tokenizer.json
- merges.txt
- vocab.json

Quantization Method

Quantizer: onnxruntime.quantization.matmul_nbits_quantizer.MatMulNBitsQuantizer
Quantization type: weight-only INT4
Scope: MatMul weights
Output format: single-file ONNX artifacts compatible with local sherpa-onnx loading

Local Validation

This INT4 variant was validated locally on Windows with CUDA-enabled sherpa-onnx.

Reference smoke test result on rag_math.wav:

fp32: 对微分形式的积分是微分几何中的基本概念。
int8: 对微分形式的积分是微分几何中的基本概念。
int4: 对微分形式的积分是微分几何中的基本概念。

Example Usage

import sherpa_onnx

recognizer = sherpa_onnx.OfflineRecognizer.from_funasr_nano(
    encoder_adaptor="encoder_adaptor.int4.onnx",
    embedding="embedding.int4.onnx",
    llm="llm.int4.onnx",
    tokenizer="Qwen3-0.6B",
    provider="cuda",
    num_threads=1,
)

Compatibility

Intended for sherpa-onnx offline inference
Tested locally with sherpa-onnx==1.12.39+cuda12.cudnn9
Tested locally with onnxruntime-gpu==1.24.4

Repository Purpose

This repository is intended as a convenient packaged INT4 deployment artifact for local or private inference workflows.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for foryoung365/Fun-ASR-Nano-2512-int4-onnx

Base model

FunAudioLLM/Fun-ASR-Nano-2512

Quantized

(2)

this model