rootabytes
/

Rootal-Twi-ASR

speech-recognition

Model card Files Files and versions

Rootal-Twi-ASR / README.md

rootabytes's picture

Update README.md

253b519 verified about 1 month ago

|

history blame contribute delete

2.84 kB

	---
	language:
	- ak
	license: cc-by-nc-4.0
	base_model: katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment
	tags:
	- whisper
	- lora
	- asante-twi
	- akan
	- speech-recognition
	- peft
	datasets:
	- mozilla-foundation/common_voice_11_0
	- michsethowusu/twi_multispeaker_audio_transcribed
	---

	# Whisper Large v3 Turbo — Asante Twi LoRA Adapter (R14)

	Fine-tuned LoRA adapter for Asante Twi automatic speech recognition, built on top of
	`katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment`.

	WER: 17.5% on LVP held-out eval set (Pilot-ready threshold: <22%)

	## Training Data

	\| Dataset \| Role \| Notes \|
	\|---\|---\|---\|
	\| LVP real recordings (private) \| Training + eval \| Collected via Rootal Audio Annotation Platform @rootal.ai; available on request \|
	\| LVP synthetic QA (private) \| Training \| TTS-generated Twi Q&A pairs \|
	\| Common Voice Akan \| Training \| Mozilla CC0 \|
	\| Financial Inclusion Speech Dataset (Ashesi) \| Training (200 samples) \| See citation below \|
	\| michsethowusu/twi_multispeaker_audio_transcribed \| Eval-only diagnostic \| Excluded from training — transcription style mismatch \|

	## Training Configuration

	- Base model: `katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment`
	- LoRA: rank=32, alpha=64, targets: q/k/v/out_proj + fc1/fc2
	- Language: `None` (Twi not in Whisper vocab — no language prefix token)
	- Anti-hallucination: `condition_on_prev_tokens=False`, `repetition_penalty=1.2`
	- Quantization: 8-bit (BitsAndBytes)

	## Citation

	If you use this adapter, please cite:

	```bibtex
	@misc{aguyatimothy2025asantetwi,
	author = {Timothy Aguya, Akasiya},
	title = {Whisper Large v3 Turbo — Asante Twi LoRA Adapter},
	year = {2026},
	publisher = {HuggingFace},
	url = {https://huggingface.co/rootabytes/whisper-large-v3-turbo-asante-twi-lvp}
	}

	@misc{financialinclusion2022,
	author = {Asamoah Owusu, D. and Korsah, A. and Quartey, B. and Nwolley Jnr., S.
	and Sampah, D. and Adjepon-Yamoah, D. and Omane Boateng, L.},
	title = {Financial Inclusion Speech Dataset},
	year = {2022},
	publisher = {Ashesi University and Nokwary Technologies},
	url = {https://github.com/Ashesi-Org/Financial-Inclusion-Speech-Dataset}
	}

	@inproceedings{ardila2020common,
	title = {Common Voice: A Massively-Multilingual Speech Corpus},
	author = {Ardila, Rosana and others},
	booktitle = {LREC},
	year = {2020}
	}

	@article{radford2022robust,
	title = {Robust Speech Recognition via Large-Scale Weak Supervision},
	author = {Radford, Alec and others},
	journal = {arXiv:2212.04356},
	year = {2022}
	}

	@article{hu2021lora,
	title = {LoRA: Low-Rank Adaptation of Large Language Models},
	author = {Hu, Edward J and others},
	journal = {arXiv:2106.09685},
	year = {2021}
	}