Configuration Parsing Warning:Invalid JSON for config file config.json

Saudi XTTS-v2

A fine-tuned XTTS-v2 model for Saudi Arabic / English code-switching text-to-speech, with a single female speaker voice ("Hoda").

Model Details

Detail Value
Base model coqui/XTTS-v2
Fine-tuned on ~50 000 synthetic Saudi Arabic sentences
Speaker Hoda (female, Saudi)
Languages Arabic (ar) + English (en)
Sample rate 24 000 Hz
Training dialect Najdi / Saudi colloquial Arabic (ars)
Precision fp16
Checkpoint step 320 190

Training Data

The model was trained on ~50 000 synthetically generated sentences covering:

  • Saudi dialect corpus segments (TaghreedT/SDC, ~60%)
  • Code-switching templates mixing Saudi Arabic and English (~24%)
  • Number verbalization sentences โ€” digits rendered in Saudi colloquial words (~16%)

Audio was synthesized with the lahgtna-omnivoice-v2 TTS system using the Najdi Arabic (ars) voice.

Files

File Description
model.pth Fine-tuned GPT weights (upload this as the XTTS checkpoint)
config.json Training / inference configuration
vocab.json XTTS-v2 tokenizer vocabulary
dvae.pth Discrete VAE (from XTTS-v2 base, unchanged)
mel_stats.pth Mel spectrogram normalisation stats (from XTTS-v2 base, unchanged)
speakers_xtts.pth Speaker embedding library (from XTTS-v2 base)
reference_audios/hoda.wav Reference audio for voice cloning
references.json Speaker metadata

Quick Start

from TTS.api import TTS

tts = TTS(model_path="Rabe3/saudi-xtts-v2", progress_bar=True)

tts.tts_to_file(
    text="ูƒูŠู ุงู„ุญุงู„ุŸ ูˆุด ู‚ุงุนุฏ ุชุณูˆูŠ ุงู„ูŠูˆู…ุŸ",
    speaker_wav="reference_audios/hoda.wav",
    language="ar",
    file_path="output.wav",
)

Or using the XTTS model directly:

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("config.json")

model = Xtts.init_from_config(config)
model.load_checkpoint(
    config,
    checkpoint_dir=".",   # directory containing model.pth, dvae.pth, etc.
    use_deepspeed=False,
)
model.cuda()

outputs = model.synthesize(
    text="ุฃุจุบู‰ ุฃู‚ูˆู„ ู„ูƒ ุนู† the new project ุงู„ู„ูŠ ู‚ุงุนุฏูŠู† ู†ุดุชุบู„ ุนู„ูŠู‡",
    config=config,
    speaker_wav="reference_audios/hoda.wav",
    language="ar",
)

License

CC BY-NC 4.0 โ€” non-commercial use only, consistent with the XTTS-v2 base model license.

Downloads last month
79
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Rabe3/saudi-xtts-v2

Base model

coqui/XTTS-v2
Finetuned
(68)
this model