Fun-ASR-Nano-2512 INT4 ONNX for sherpa-onnx

This repository contains a locally quantized INT4 ONNX variant of FunAudioLLM/Fun-ASR-Nano-2512, prepared for sherpa-onnx offline inference.

Important Notes

  • This is not an official release from FunAudioLLM, ModelScope, or k2-fsa.
  • The INT4 weights were generated locally from the fp32 ONNX package distributed in the k2-fsa/sherpa-onnx ASR model release assets.
  • The original upstream model card currently does not declare clear Hugging Face YAML license metadata. Please verify the upstream usage terms before any redistribution or commercial use.

Source and Lineage

Files

  • encoder_adaptor.int4.onnx
  • embedding.int4.onnx
  • llm.int4.onnx
  • Qwen3-0.6B/
    • tokenizer.json
    • merges.txt
    • vocab.json

Quantization Method

  • Quantizer: onnxruntime.quantization.matmul_nbits_quantizer.MatMulNBitsQuantizer
  • Quantization type: weight-only INT4
  • Scope: MatMul weights
  • Output format: single-file ONNX artifacts compatible with local sherpa-onnx loading

Local Validation

This INT4 variant was validated locally on Windows with CUDA-enabled sherpa-onnx.

Reference smoke test result on rag_math.wav:

  • fp32: 对微分形式的积分是微分几何中的基本概念。
  • int8: 对微分形式的积分是微分几何中的基本概念。
  • int4: 对微分形式的积分是微分几何中的基本概念。

Example Usage

import sherpa_onnx

recognizer = sherpa_onnx.OfflineRecognizer.from_funasr_nano(
    encoder_adaptor="encoder_adaptor.int4.onnx",
    embedding="embedding.int4.onnx",
    llm="llm.int4.onnx",
    tokenizer="Qwen3-0.6B",
    provider="cuda",
    num_threads=1,
)

Compatibility

  • Intended for sherpa-onnx offline inference
  • Tested locally with sherpa-onnx==1.12.39+cuda12.cudnn9
  • Tested locally with onnxruntime-gpu==1.24.4

Repository Purpose

This repository is intended as a convenient packaged INT4 deployment artifact for local or private inference workflows.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for foryoung365/Fun-ASR-Nano-2512-int4-onnx

Quantized
(2)
this model