Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Instructions to use patrol90/whisper-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use patrol90/whisper-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="patrol90/whisper-large")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("patrol90/whisper-large") model = AutoModelForSpeechSeq2Seq.from_pretrained("patrol90/whisper-large") - Notebooks
- Google Colab
- Kaggle
| from typing import Dict | |
| from transformers.pipelines.audio_utils import ffmpeg_read | |
| import whisper | |
| import torch | |
| SAMPLE_RATE = 16000 | |
| class EndpointHandler(): | |
| def __init__(self, path=""): | |
| # load the model | |
| self.model = whisper.load_model("large") | |
| def __call__(self, data: Dict[str, bytes]) -> Dict[str, str]: | |
| """ | |
| Args: | |
| data (:obj:): | |
| includes the deserialized audio file as bytes | |
| Return: | |
| A :obj:`dict`:. base64 encoded image | |
| """ | |
| # process input | |
| inputs = data.pop("inputs", data) | |
| audio_nparray = ffmpeg_read(inputs, SAMPLE_RATE) | |
| audio_tensor= torch.from_numpy(audio_nparray) | |
| # run inference pipeline | |
| result = self.model.transcribe(audio_nparray) | |
| # postprocess the prediction | |
| return { | |
| "segments": result["segments"], | |
| "language_code": result["language"], | |
| "text": result["text"], | |
| } | |