Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results (legacy)
Eval Results
Instructions to use openai/whisper-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large") - Notebooks
- Google Colab
- Kaggle
forced_decoder_ids not applied properly when generation
#10
by minseong-ringle - opened
input_features = processor(input, return_tensors="pt").input_features
forced_decoder_ids = processor.get_decoder_prompt_ids(language = "en", task = "transcribe", no_timestamps=False)
predicted_ids = model.generate(input_features, forced_decoder_ids = forced_decoder_ids)
transcription = processor.batch_decode(predicted_ids)
# This results in
# tensor([[50258, 50259, 50359, 50363
# -> "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>
# for transcription
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language = "en", task = "transcribe", no_timestamps=False)
# also using this cause the same result.
Here are some code snippets I've tried so far.
I cannot remove notimestamps token as a decoder input.
Any rescues?
Thank you for your help in advance.
Hey! So this might be related to the fact that the "<|notimestamps|>" token is not in the list of suppress tokens! This means that the model is just predicting this token.
We should probably add it to the list of the suppress_tokens