| --- |
| language: |
| - ak |
| license: cc-by-nc-4.0 |
| base_model: katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment |
| tags: |
| - whisper |
| - lora |
| - asante-twi |
| - akan |
| - speech-recognition |
| - peft |
| datasets: |
| - mozilla-foundation/common_voice_11_0 |
| - michsethowusu/twi_multispeaker_audio_transcribed |
| --- |
| |
| # Whisper Large v3 Turbo — Asante Twi LoRA Adapter (R14) |
|
|
| Fine-tuned LoRA adapter for Asante Twi automatic speech recognition, built on top of |
| `katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment`. |
|
|
| **WER: 17.5%** on LVP held-out eval set (Pilot-ready threshold: <22%) |
|
|
| ## Training Data |
|
|
| | Dataset | Role | Notes | |
| |---|---|---| |
| | LVP real recordings (private) | Training + eval | Collected via Rootal Audio Annotation Platform @rootal.ai; available on request | |
| | LVP synthetic QA (private) | Training | TTS-generated Twi Q&A pairs | |
| | Common Voice Akan | Training | Mozilla CC0 | |
| | Financial Inclusion Speech Dataset (Ashesi) | Training (200 samples) | See citation below | |
| | michsethowusu/twi_multispeaker_audio_transcribed | Eval-only diagnostic | Excluded from training — transcription style mismatch | |
| |
| ## Training Configuration |
| |
| - **Base model**: `katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment` |
| - **LoRA**: rank=32, alpha=64, targets: q/k/v/out_proj + fc1/fc2 |
| - **Language**: `None` (Twi not in Whisper vocab — no language prefix token) |
| - **Anti-hallucination**: `condition_on_prev_tokens=False`, `repetition_penalty=1.2` |
| - **Quantization**: 8-bit (BitsAndBytes) |
| |
| ## Citation |
| |
| If you use this adapter, please cite: |
| |
| ```bibtex |
| @misc{aguyatimothy2025asantetwi, |
| author = {Timothy Aguya, Akasiya}, |
| title = {Whisper Large v3 Turbo — Asante Twi LoRA Adapter}, |
| year = {2026}, |
| publisher = {HuggingFace}, |
| url = {https://huggingface.co/rootabytes/whisper-large-v3-turbo-asante-twi-lvp} |
| } |
| |
| @misc{financialinclusion2022, |
| author = {Asamoah Owusu, D. and Korsah, A. and Quartey, B. and Nwolley Jnr., S. |
| and Sampah, D. and Adjepon-Yamoah, D. and Omane Boateng, L.}, |
| title = {Financial Inclusion Speech Dataset}, |
| year = {2022}, |
| publisher = {Ashesi University and Nokwary Technologies}, |
| url = {https://github.com/Ashesi-Org/Financial-Inclusion-Speech-Dataset} |
| } |
| |
| @inproceedings{ardila2020common, |
| title = {Common Voice: A Massively-Multilingual Speech Corpus}, |
| author = {Ardila, Rosana and others}, |
| booktitle = {LREC}, |
| year = {2020} |
| } |
| |
| @article{radford2022robust, |
| title = {Robust Speech Recognition via Large-Scale Weak Supervision}, |
| author = {Radford, Alec and others}, |
| journal = {arXiv:2212.04356}, |
| year = {2022} |
| } |
| |
| @article{hu2021lora, |
| title = {LoRA: Low-Rank Adaptation of Large Language Models}, |
| author = {Hu, Edward J and others}, |
| journal = {arXiv:2106.09685}, |
| year = {2021} |
| } |