Scaling Speech Technology to 1,000+ Languages
Paper โข 2305.13516 โข Published โข 12
Fine-tuned version of facebook/mms-tts-kdh
on the Eyaa-Tom dataset for Kotokoli + Tem (kdh).
Kotokoli and Tem are the same language (ISO kdh). Trained on merged data from both folders.
| Field | Value |
|---|---|
| Language | Kotokoli + Tem |
| ISO 639-3 (MMS) | kdh |
| Your ISO | kdh / kot |
| Region | Togo/Ghana |
| Family | Gur (Niger-Congo) |
| Base model | facebook/mms-tts-kdh |
from transformers import VitsModel, VitsTokenizer
import torch, torchaudio
model = VitsModel.from_pretrained("Umbaji/eyaa-tom-mms-tts-kdh")
tokenizer = VitsTokenizer.from_pretrained("Umbaji/eyaa-tom-mms-tts-kdh")
inputs = tokenizer("your text here", return_tensors="pt")
with torch.no_grad():
waveform = model(**inputs).waveform[0]
torchaudio.save("output.wav", waveform.unsqueeze(0), model.config.sampling_rate)
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Pratap, Vineel et al.},
journal={arXiv preprint arXiv:2305.13516},
year={2023}
}
Fine-tuned: 2026-02-25 โ Eyaa-Tom project
Base model
facebook/mms-tts-kdh