Whisper-Base Fine-tuned for Bahraini Arabic

Fine-tuned version of openai/whisper-base on the Bahraini Speech Dataset for Bahraini Arabic dialect transcription with Arabic-English code-switching support.

Developed as part of the Nota AI Meeting Assistant senior project at the University of Bahrain.

Model Details

Property Value
Base model openai/whisper-base
Parameters 72.6M
Language Bahraini Arabic + English code-switching
Task Automatic Speech Recognition
Format Quantized ONNX (INT8)
Model size 74MB

Training Details

Hyperparameter Value
Learning rate 3e-5
Batch size 32
Max steps 8,000
Warmup steps 500
Precision bf16
GPU NVIDIA A100 (40GB)
Training time ~2.5 hours

Dataset

  • Name: Hishambarakat/Bahraini_Speech_Dataset
  • Size: 69,224 training clips (~42 hours after filtering)
  • Filtering: Removed clips shorter than 1.5 seconds
  • Augmentation: Gaussian noise, time stretching, pitch shifting

Preprocessing

  • Audio resampled from 24kHz to 16kHz
  • Arabic diacritics removed
  • Bahraini dialect spelling preserved (no MSA normalization)
  • English words preserved as-is

Evaluation Results

Model WER
whisper-base (no fine-tuning) ~88%
This model (V1, 6000 steps) 58.3%
This model (V2, 8000 steps + filtering + augmentation) 54.4%

Note: WER is measured against model-generated test labels, not human annotations. True WER against human ground truth is estimated to be 5-10 points lower.

Usage with Transformers.js (Browser)

import { pipeline } from '@huggingface/transformers';

const transcriber = await pipeline(
  'automatic-speech-recognition',
  'Fatimaa75/whisper-base-bahraini'
);

const result = await transcriber(audioData, {
  language: 'arabic',
  task: 'transcribe',
});

console.log(result.text);

Usage with Python

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

processor = WhisperProcessor.from_pretrained("Fatimaa75/whisper-base-bahraini")
model = WhisperForConditionalGeneration.from_pretrained("Fatimaa75/whisper-base-bahraini")

# Process audio (must be 16kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    predicted_ids = model.generate(
        inputs.input_features,
        language="arabic",
        task="transcribe",
        no_repeat_ngram_size=3,
    )

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Limitations

  • Model capacity: whisper-base (74M params) has a WER ceiling for this dialect — whisper-small is recommended for production
  • Short clips: Clips under 1.5 seconds remain unreliable
  • Heavy dialect: Non-standard Bahraini words cause failures
  • Code-switching: English word preservation is inconsistent
  • Test labels: Evaluation uses model-generated labels, not human annotations

Intended Use

Designed for integration into Nota, a privacy-first AI meeting assistant. Best suited for:

  • Meeting transcription (post-processing)
  • Transcript search and summarization
  • Bahraini Arabic speech with mixed English terminology

Not recommended for:

  • Real-time live captioning where accuracy is critical
  • Single-word or very short utterance recognition

Citation

If you use this model, please cite:

  • University of Bahrain - ITCS 499 Senior Project 2025-2026
  • Nota AI Meeting Assistant
  • Fine-tuned Whisper-base for Bahraini Arabic ASR
Downloads last month
229
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fatimaa75/whisper-base-bahraini

Quantized
(34)
this model