openslr/librispeech_asr
Viewer • Updated • 585k • 98.3k • 222
How to use OthmaneJ/distil-wav2vec2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="OthmaneJ/distil-wav2vec2") # Load model directly
from transformers import AutoProcessor, AutoModelForCTC
processor = AutoProcessor.from_pretrained("OthmaneJ/distil-wav2vec2")
model = AutoModelForCTC.from_pretrained("OthmaneJ/distil-wav2vec2")This model is a distilled version of the wav2vec2 model (https://arxiv.org/pdf/2006.11477.pdf). This model is 45% times smaller and twice as fast as the original wav2vec2 base model.
This model achieves the following results (speed is mesured for a batch size of 64):
| Model | Size | WER Librispeech-test-clean | WER Librispeech-test-other | Speed on cpu | speed on gpu |
|---|---|---|---|---|---|
| Distil-wav2vec2 | 197.9 Mb | 0.0983 | 0.2266 | 0.4006s | 0.0046s |
| wav2vec2-base | 360 Mb | 0.0389 | 0.1047 | 0.4919s | 0.0082s |
notebook (executes seamlessly on google colab) at https://github.com/OthmaneJ/distil-wav2vec2