Whisper-Base Fine-tuned for Bahraini Arabic
Fine-tuned version of openai/whisper-base on the Bahraini Speech Dataset for Bahraini Arabic dialect transcription with Arabic-English code-switching support.
Developed as part of the Nota AI Meeting Assistant senior project at the University of Bahrain.
Model Details
| Property | Value |
|---|---|
| Base model | openai/whisper-base |
| Parameters | 72.6M |
| Language | Bahraini Arabic + English code-switching |
| Task | Automatic Speech Recognition |
| Format | Quantized ONNX (INT8) |
| Model size | 74MB |
Training Details
| Hyperparameter | Value |
|---|---|
| Learning rate | 3e-5 |
| Batch size | 32 |
| Max steps | 8,000 |
| Warmup steps | 500 |
| Precision | bf16 |
| GPU | NVIDIA A100 (40GB) |
| Training time | ~2.5 hours |
Dataset
- Name: Hishambarakat/Bahraini_Speech_Dataset
- Size: 69,224 training clips (~42 hours after filtering)
- Filtering: Removed clips shorter than 1.5 seconds
- Augmentation: Gaussian noise, time stretching, pitch shifting
Preprocessing
- Audio resampled from 24kHz to 16kHz
- Arabic diacritics removed
- Bahraini dialect spelling preserved (no MSA normalization)
- English words preserved as-is
Evaluation Results
| Model | WER |
|---|---|
| whisper-base (no fine-tuning) | ~88% |
| This model (V1, 6000 steps) | 58.3% |
| This model (V2, 8000 steps + filtering + augmentation) | 54.4% |
Note: WER is measured against model-generated test labels, not human annotations. True WER against human ground truth is estimated to be 5-10 points lower.
Usage with Transformers.js (Browser)
import { pipeline } from '@huggingface/transformers';
const transcriber = await pipeline(
'automatic-speech-recognition',
'Fatimaa75/whisper-base-bahraini'
);
const result = await transcriber(audioData, {
language: 'arabic',
task: 'transcribe',
});
console.log(result.text);
Usage with Python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
processor = WhisperProcessor.from_pretrained("Fatimaa75/whisper-base-bahraini")
model = WhisperForConditionalGeneration.from_pretrained("Fatimaa75/whisper-base-bahraini")
# Process audio (must be 16kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
predicted_ids = model.generate(
inputs.input_features,
language="arabic",
task="transcribe",
no_repeat_ngram_size=3,
)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Limitations
- Model capacity: whisper-base (74M params) has a WER ceiling for this dialect — whisper-small is recommended for production
- Short clips: Clips under 1.5 seconds remain unreliable
- Heavy dialect: Non-standard Bahraini words cause failures
- Code-switching: English word preservation is inconsistent
- Test labels: Evaluation uses model-generated labels, not human annotations
Intended Use
Designed for integration into Nota, a privacy-first AI meeting assistant. Best suited for:
- Meeting transcription (post-processing)
- Transcript search and summarization
- Bahraini Arabic speech with mixed English terminology
Not recommended for:
- Real-time live captioning where accuracy is critical
- Single-word or very short utterance recognition
Citation
If you use this model, please cite:
- University of Bahrain - ITCS 499 Senior Project 2025-2026
- Nota AI Meeting Assistant
- Fine-tuned Whisper-base for Bahraini Arabic ASR
- Downloads last month
- 229
Model tree for Fatimaa75/whisper-base-bahraini
Base model
openai/whisper-base