OmniVoice 🌍

OmniVoice

Hugging Face Model   Hugging Face Space     GitHub Code  

This OmniVoice variant was trained exclusively on the Chinese and English subsets of the Emilia dataset and corresponds to the "OmniVoice-Emilia" model described in our paper. It is intended for researchers aiming to reproduce the experimental results reported therein. For regular end users seeking superior performance, we recommend using the full-dataset-trained OmniVoice checkpoint OmniVoice instead.

When using this checkpoint, set denoise = False and lang_id = None: the model was trained without prompt denoising or language-ID conditioning.

Citation

@article{zhu2026omnivoice,
      title={OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models},
      author={Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel},
      journal={arXiv preprint arXiv:2604.00688},
      year={2026}
}
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
I64
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for k2-fsa/OmniVoice-Emilia