whisper-large-v3

Multi-format version of openai/whisper-large-v3 - được tối ưu cho deployment.

Thông tin Model

Thuộc tính	Giá trị
Model gốc	openai/whisper-large-v3
Task	automatic-speech-recognition
Loại	Audio/Speech Model
Trust Remote Code	False

Các phiên bản

Thư mục	Format	Mô tả	Kích thước
`safetensors-fp32/`	PyTorch FP32	Baseline, accuracy cao nhất	5890 MB
`onnx-fp32/`	ONNX FP32	Portable, cross-platform	5895 MB
`safetensors-fp16/`	PyTorch FP16	GPU inference, giảm ~50%	2946 MB
`onnx-int8/`	ONNX INT8	CPU inference, giảm ~75%	621 MB

Cách sử dụng

PyTorch (GPU)

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
import torch

# GPU inference với FP16
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    "n24q02m/whisper-large-v3",
    subfolder="safetensors-fp16",
    torch_dtype=torch.float16
).cuda()
processor = AutoProcessor.from_pretrained(
    "n24q02m/whisper-large-v3",
    subfolder="safetensors-fp16"
)

# Inference với audio file
import librosa
audio, sr = librosa.load("audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda")

with torch.no_grad():
    generated_ids = model.generate(inputs["input_features"], max_new_tokens=256)
    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(transcription)

ONNX Runtime (CPU)

from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import AutoProcessor
import librosa

# CPU inference với ONNX Runtime
model = ORTModelForSpeechSeq2Seq.from_pretrained(
    "n24q02m/whisper-large-v3",
    subfolder="onnx-int8"
)
processor = AutoProcessor.from_pretrained(
    "n24q02m/whisper-large-v3",
    subfolder="onnx-int8"
)

# Inference
audio, sr = librosa.load("audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

generated_ids = model.generate(inputs["input_features"], max_new_tokens=256)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(transcription)

Lưu ý

SafeTensors FP16 là format chính cho GPU inference
Sử dụng AutoProcessor thay vì AutoTokenizer để xử lý audio
ONNX export tạo nhiều file: encoder_model.onnx, decoder_model.onnx
Audio cần được resample về 16kHz trước khi inference
Model hỗ trợ nhiều ngôn ngữ, có thể chỉ định language và task trong generate()

License

Apache 2.0 (theo license của model gốc)

Credits

Model gốc: openai/whisper-large-v3
Conversion: Optimum + PyTorch

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for n24q02m/whisper-large-v3

Base model

openai/whisper-large-v3

Quantized

(18)

this model