google/fleurs
Viewer • Updated • 768k • 63.4k • 407
Thai automatic speech recognition model, full fine-tune of Qwen/Qwen3-ASR-1.7B on the Thai split of google/fleurs.
FLEURS Thai test split (1,021 utterances). Reported via the evaluate library — raw model output vs reference, no text normalisation.
| Model | CER (%) | WER (%) |
|---|---|---|
| Qwen/Qwen3-ASR-1.7B (base) | 8.32 | 79.56 |
| This model | 7.02 | 61.00 |
| Δ relative improvement | +15.7% | +23.3% |
Lower is better.
import torch
from qwen_asr import Qwen3ASRModel
model = Qwen3ASRModel.from_pretrained(
"PogusTheWhisper/Qwen3-ASR-1.7B-th-fleurs",
dtype=torch.bfloat16,
device_map="cuda:0",
max_inference_batch_size=16,
max_new_tokens=256,
)
results = model.transcribe(audio="path/to/audio.wav", language="Thai")
print(results[0].text)
For maximum throughput, use the vLLM backend:
model = Qwen3ASRModel.LLM(
model="PogusTheWhisper/Qwen3-ASR-1.7B-th-fleurs",
gpu_memory_utilization=0.7,
max_inference_batch_size=128,
max_new_tokens=4096,
)
results = model.transcribe(audio="path/to/audio.wav", language="Thai")
print(results[0].text)
Full fine-tune (FFT) of the entire 1.7B-parameter base model using the official QwenLM/Qwen3-ASR qwen3_asr_sft.py script.
| Hyperparameter | Value |
|---|---|
| Base model | Qwen/Qwen3-ASR-1.7B |
| Method | Full fine-tune (FFT, not LoRA) |
| Dataset | google/fleurs (th_th, 2,602 train / 1,021 test utterances) |
| Label format | language Thai<asr_text>{{transcript}} |
| Optimizer | AdamW (HF Trainer defaults) |
| Learning rate | 2e-5 (Official tested at this lr w/ eff bs 32) |
| LR scheduler | cosine (Smoother decay, better final epochs) |
| Warmup ratio | 0.05 (0.1 too aggressive for cosine) |
| weight_decay | 0.01 (Anti-memorization) |
| Effective batch size | 32 (per-device 1 × grad_acc 32) |
| Epochs | 5 |
| Precision | bfloat16 |
| Hardware | 1× NVIDIA RTX 3090 (24 GB) |
| Training time | ~25 minutes |
@misc{qwen3asr,
title = {Qwen3-ASR},
author = {Qwen Team},
year = {2025},
url = {https://huggingface.co/Qwen/Qwen3-ASR-1.7B}
}
@inproceedings{conneau2023fleurs,
title = {FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech},
author = {Conneau, Alexis and Ma, Min and Khanuja, Simran and Zhang, Yu and Axelrod, Vera and Dalmia, Siddharth and Riesa, Jason and Rivera, Clara and Bapna, Ankur},
booktitle = {SLT},
year = {2023}
}
Base model
Qwen/Qwen3-ASR-1.7B