Whisper Medium Quran (LoRA Fine-Tuned)
This is a specialized Automatic Speech Recognition (ASR) model for Quranic Recitation. It is a fine-tuned version of openai/whisper-medium, optimized to recognize Quranic Arabic with high accuracy while maintaining robustness across different recording conditions.
Model Performance
- Word Error Rate (WER): Achieved 12.69% on the
tarteel-ai/everyayahvalidation set. - Accuracy: The model demonstrates high precision in capturing Quranic vocabulary and Uthmani script nuances.
Training Details
The model was trained using LoRA (Low-Rank Adaptation) in a multi-stage curriculum learning process to ensure stability and precision.
Datasets
The training utilized a mix of professional and diverse recitations from two primary sources:
- MohamedRashad/Quran-Recitations
- tarteel-ai/everyayah (Highly diverse professional recitations)
Methodology
- Curriculum Learning: The model was trained gradually across these datasets to refine its understanding of Tajweed and Quranic sentence structures.
- Data Augmentation: To ensure the model remains robust against real-world conditions (non-studio microphones, background noise, varying volumes), diverse audio augmentations including gain adjustments and spectral masking were applied during the training process.
Usage
This model is fully compatible with the Hugging Face transformers pipeline. For longer verses, chunking is recommended to maintain context.
from transformers import pipeline
# Load the pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="MaddoggProduction/whisper-m-quran-lora-dataset-mix",
device=0 # for GPU usage, -1 for CPU
)
# Transcribe audio (chunking enabled for long verses)
result = pipe(
"path_to_audio.mp3",
chunk_length_s=30, # Critical for long verses like 2:282, to avoid hallucinations
stride_length_s=5,
batch_size=8,
return_timestamps=True
)
print(result["text"])
- Downloads last month
- 32
Model tree for MaddoggProduction/whisper-m-quran-lora-dataset-mix
Base model
openai/whisper-medium