| --- |
| license: cc-by-4.0 |
| language: |
| - en |
| - it |
| - pt |
| - de |
| - fr |
| - es |
| - ja |
| - zh |
| tags: |
| - automatic-speech-recognition |
| - speech |
| - audio |
| - Transformer |
| - flow-matching |
| - discrete-flow-matching |
| - pytorch |
| - hf-asr-leaderboard |
| library_name: drax |
| --- |
| |
| # Drax: Speech Recognition with Discrete Flow Matching |
|
|
| ## Model Overview |
|
|
| The Drax model family provides speech recognition models based on discrete flow matching. |
| The `drax-v1` model supports eight languages: English, Spanish, French, Portuguese, German, Italian, Japanese and Chinese. |
| It is an encoder-decoder model consists of a Whisper-large-v3 encoder, and a DiT based decoder, with a total of ~1.2B parameters. |
|
|
| More details on usage in our GitHub repo, [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) and our [paper](https://arxiv.org/abs/2510.04162). |
|
|
| ## Usage |
|
|
| See [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) for installation instructions. |
|
|
| ```python |
| from drax import Transcriber |
| |
| asr = Transcriber(model_path="aiola/drax-v1") |
| result = asr.transcribe("/path/to/audio.wav", language="en") |
| print(result[0].transcript) |
| ``` |
|
|
| Control sampling steps, temperature etc. |
|
|
| ```python |
| from drax import Transcriber |
| |
| asr = Transcriber(model_path="aiola/drax-v1") |
| result = asr.transcribe("/path/to/audio.wav", language="en", sampling_steps=32, temperature=1e-2) |
| print(result[0].transcript) |
| ``` |
|
|
| Batch inference: |
|
|
| ```python |
| from drax import Transcriber |
| |
| asr = Transcriber(model_path="aiola/drax-v1") |
| audio_paths = ["/path/to/audio1.wav", "/path/to/audio2.wav"] |
| languages = ["en", "de"] |
| result = asr.transcribe(audio_paths, language=languages) |
| print(result.transcript) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{navon2025drax, |
| title={Drax: Speech Recognition with Discrete Flow Matching}, |
| author={Navon, Aviv and Shamsian, Aviv and Glazer, Neta and Segal-Feldman, Yael and Hetz, Gill and Keshet, Joseph and Fetaya, Ethan}, |
| journal={arXiv preprint arXiv:2510.04162}, |
| year={2025} |
| } |
| ``` |