bol-tts-marathi — Kokoro-82M fine-tuned for Marathi
A Marathi (मराठी) text-to-speech fine-tune of hexgrad/Kokoro-82M, trained with the semidark/kokoro-deutsch recipe. Handles pure Marathi and Minglish (Marathi + English code-switching) via a client-side Devanagari-transliteration preprocessor.
- Architecture: StyleTTS2 acoustic model + ISTFTNet decoder (Kokoro-82M, unchanged)
- Parameters: 81.76 M
- Sample rate: 24 kHz
- Voices: 25 (4 Marathi-trained + 19 stock-Kokoro crossovers + 2 synthetic) — see voice catalog below
- Live demo: shreyask/bol-tts-marathi (in-browser via WebGPU)
- Write-up: kshreyas.dev/post/bol-tts-marathi — full design + debugging story with audio samples
- Code: github.com/shreyaskarnik/bol-tts-marathi
- ONNX export: shreyask/bol-tts-marathi-onnx
Voice catalog (25 voices)
Marathi-trained (4)
| ID | Display | Source | Default speed |
|---|---|---|---|
mf_asha |
Asha (आशा) | Rasa marathi_female |
1.00× |
mm_vivek |
Vivek (विवेक) | Rasa marathi_male |
1.00× |
mf_mukta |
Mukta (मुक्ता) | SPRINGLab female | 0.80× |
mm_dnyanesh |
Dnyanesh (ज्ञानेश) | SPRINGLab male | 0.80× |
Stock-Kokoro crossovers (19)
Stock voicepacks from hexgrad/kokoro.js used as ref_s on this fine-tune. Because v0.2 is a continuation fine-tune, the encoder latent space stays close enough to stock Kokoro's that stock voicepacks plug in directly. Pre-screened by peak < 0.95 to filter ones that clip.
| ID | Display | Source language |
|---|---|---|
af_heart |
Svara (स्वरा) | US English F |
af_alloy |
Anvita (अन्विता) | US English F |
af_aoede |
Sanika (सानिका) | US English F |
af_bella |
Naina (नैना) | US English F |
af_jessica |
Ishani (ईशानी) | US English F |
af_nova |
Tara (तारा) | US English F |
af_sarah |
Kavya (काव्या) | US English F |
af_sky |
Akasha (आकाशा) | US English F |
am_liam |
Atharv (अथर्व) | US English M |
bf_isabella |
Ira (इरा) | UK English F |
bm_fable |
Aaryan (आर्यन) | UK English M |
ff_siwis |
Esha (ईशा) | French F |
hm_omega |
Vihaan (विहान) | Hindi M |
im_nicola |
Niraj (निरज) | Italian M |
pf_dora |
Rhea (रिया) | Portuguese F |
zf_xiaoni |
Nyra (नयरा) | Mandarin F |
zf_xiaoxiao |
Pari (परी) | Mandarin F (kid) |
zf_xiaoyi |
Vir (वीर) | Mandarin F (perceived M kid) |
zm_yunyang |
Aakash (आकाश) | Mandarin M |
Synthetic — generated arithmetically with no reference audio (2)
| ID | Display | Recipe |
|---|---|---|
syn_sama |
Sama (समा) | Centroid (mean) of 5 modern English female voicepacks |
syn_navya |
Navya (नव्या) | Centroid + per-position Gaussian noise (1σ) |
The voicepack tensor [510, 1, 256] is a plain embedding — it can be constructed by averaging existing voicepacks, sampling near the centroid, or interpolating. See voicepack zoo in the repo for recipes.
Usage
import torch, soundfile as sf
from kokoro import KModel, KPipeline
import kokoro.pipeline as _kp
_kp.LANG_CODES["m"] = "mr" # monkey-patch Marathi lang code
kmodel = KModel(
repo_id="shreyask/bol-tts-marathi",
config="config.json",
model="kokoro-mr-v0_2.pth",
)
kmodel.train(False)
pipeline = KPipeline(lang_code="m", repo_id="shreyask/bol-tts-marathi", model=kmodel)
voice = torch.load("voices/mf_asha.pt", map_location="cpu", weights_only=True)
text = "नमस्कार, मी मराठी बोलतो."
chunks = []
for _gs, _ps, audio in pipeline(text, voice=voice, speed=1.0):
chunks.append(audio)
sf.write("out.wav", chunks[0].numpy() if len(chunks) == 1 else torch.cat(chunks).numpy(), 24000)
Minglish (loanword) handling
For Marathi mixed with English ("Friday ला Zomato वर dinner order करूया का?"), use the loanword preprocessor first to transliterate Latin tokens to Devanagari before phonemization:
from preprocess_loanwords import preprocess
text = preprocess("Friday ला Zomato वर dinner order करूया का?")
# → "फ्रायडे ला झोमॅटो वर डिनर ऑर्डर करूया का?"
# Then feed to the pipeline as usual.
Source + ~19,500-entry lookup table: scripts/preprocess_loanwords.py and data/loanword_map.json.
Per-voice timestamps
Kokoro predicts per-phoneme durations. KModel.forward_with_tokens returns (audio, pred_dur). pred_dur is in predictor frames where 1 frame = 600 audio samples at 24 kHz (the prosody predictor runs at half the mel-frame rate; the decoder upsamples 2× before iSTFT):
audio, pred_dur = kmodel.forward_with_tokens(input_ids, ref_s, speed=1.0)
durations_sec = pred_dur.squeeze().cpu().numpy() * 600 / 24000
starts = durations_sec.cumsum() - durations_sec
# (starts[i], starts[i] + durations_sec[i]) is the time span of phoneme[i]
Training
| Phase | Details |
|---|---|
| Base | hexgrad/Kokoro-82M |
| Stage 1 | 10 epochs, bs=12, fp32, ~9h on A100 SXM 80GB. Final val_loss ≈ 0.23 |
| Stage 2 | 10 epochs, bs=8, PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, ~13h |
| Train utts | 24,676 (95/5 split) |
| Speakers | 331 (2 Rasa + 329 IndicVoices-R) + SPRINGLab IndicTTS-Marathi (single F + single M) |
| Vocab change | ɭ (U+026D, retroflex lateral) at Kokoro slot 144 — Marathi-specific phoneme that Hindi doesn't have |
Full methodology: TRAINING_GUIDE.md.
Datasets
- AI4Bharat/Rasa (CC-BY-4.0) — Marathi, 13,900 studio-quality utts, 2 speakers.
- AI4Bharat/IndicVoices-R (CC-BY-4.0, gated) — Marathi, ~11,910 utts, 329 speakers after filtering.
- SPRINGLab/IndicTTS-Marathi (IITM EULA, commercial-OK) — single female + single male speaker, used for Mukta + Dnyanesh.
Limitations
- Pure-English-only sentences — the decoder hallucinates Marathi acoustics if you don't give it any Devanagari context. The Minglish trick handles mixed input via Devanagari transliteration; pure English needs a different fallback.
- Long-tail loanwords — the 19,500-entry map covers high-frequency English words in Indian usage; rarer words fall through to espeak-mr unchanged.
- Decoder English-leakage is accidental, not designed — v0.2's decoder happens to render
/ɟʰ/(Devanagari झ) with an English-flavored/z/quality, which makes "amazing" → अमेझिंग → audible "amazing." The follow-up v0.5 retraining lost this property by being more correctly Marathi; v0.6 is planned to preserve the leakage deliberately.
License
Apache 2.0. Training data under their respective licenses (Rasa CC-BY-4.0, IndicVoices-R CC-BY-4.0, SPRINGLab IITM EULA).
Citation
@software{bol_tts_marathi_2026,
title={bol-tts-marathi: Kokoro-82M fine-tuned for Marathi},
author={Karnik, Shreyas},
year={2026},
url={https://github.com/shreyaskarnik/bol-tts-marathi},
license={Apache-2.0}
}
@software{kokoro_2025,
title={Kokoro-82M},
author={hexgrad},
year={2025},
url={https://github.com/hexgrad/kokoro}
}
@software{kokoro_deutsch_2026,
title={kokoro-deutsch},
author={semidark},
year={2026},
url={https://github.com/semidark/kokoro-deutsch}
}
- Downloads last month
- 447