SE-Bridge-TTS Weights

This model repository hosts the public release checkpoints for SE-Bridge-TTS, the project page for the ICML 2026 paper Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models.

Links

Hugging Face Classification

  • Repository type: model
  • Task / pipeline: text-to-speech
  • Library: pytorch
  • Languages: Thai (th) and Lao (lo)
  • Primary tags: text-to-speech, speech-synthesis, thai, lao, low-resource, spoken-language-model

Files

File Description
thai_tts.pt Public Thai TTS checkpoint.
lao_tts.pt Public Lao TTS checkpoint.
release_config.json Sanitized release metadata for the two checkpoints.

Inference

The released files are CosyVoice2 LLM checkpoints. They are intended to be loaded with a CosyVoice2-compatible checkout and the standard CosyVoice2 base model assets. The base model directory should contain the normal CosyVoice2 configuration and acoustic/vocoder weights, while this repository supplies the Thai or Lao LLM checkpoint.

Recommended inference mode by language:

Checkpoint Language Recommended mode
thai_tts.pt Thai (th) Cross-lingual inference with inference_cross_lingual.
lao_tts.pt Lao (lo) Cross-lingual inference with inference_cross_lingual.

For this release, use cross-lingual inference as the default path for both Thai and Lao. Thai can also be tried with the zero-shot inference API when stronger prompt-speaker resemblance is desired, but that mode may be less stable, so use it cautiously and compare outputs. Lao should remain on the cross-lingual path.

Install or prepare CosyVoice first:

git clone https://github.com/FunAudioLLM/CosyVoice.git
cd CosyVoice
pip install -r requirements.txt
pip install huggingface_hub torchaudio

Default cross-lingual inference example:

import sys
from pathlib import Path

import torch
import torchaudio
from huggingface_hub import snapshot_download

sys.path.append("third_party/Matcha-TTS")

from cosyvoice.cli.cosyvoice import CosyVoice2
from cosyvoice.utils.file_utils import load_wav


HF_REPO_ID = "isabeth/SE-Bridge-TTS"
BASE_MODEL_DIR = Path("pretrained_models/CosyVoice2-0.5B")

language = "thai"  # choose "thai" or "lao"; both default to cross-lingual
checkpoint_name = {
    "thai": "thai_tts.pt",
    "lao": "lao_tts.pt",
}[language]

weights_dir = Path(snapshot_download(HF_REPO_ID))
checkpoint_path = weights_dir / checkpoint_name

cosyvoice = CosyVoice2(
    str(BASE_MODEL_DIR),
    load_jit=False,
    load_trt=False,
    load_vllm=False,
    fp16=False,
)
state_dict = torch.load(checkpoint_path, map_location="cpu")
cosyvoice.model.llm.load_state_dict(state_dict, strict=False)

prompt_speech_16k = load_wav("prompt.wav", 16000)
tts_text = "Text to synthesize in the selected language."

if language not in {"thai", "lao"}:
    raise ValueError("language must be either 'thai' or 'lao'")

outputs = cosyvoice.inference_cross_lingual(
    tts_text,
    prompt_speech_16k,
    stream=False,
)

for idx, output in enumerate(outputs):
    torchaudio.save(
        f"se_bridge_tts_{language}_cross_lingual_{idx}.wav",
        output["tts_speech"],
        cosyvoice.sample_rate,
    )

Optional Thai zero-shot variant:

language = "thai"
prompt_text = "Transcript of prompt.wav."
outputs = cosyvoice.inference_zero_shot(
    tts_text,
    prompt_text,
    prompt_speech_16k,
    stream=False,
)

for idx, output in enumerate(outputs):
    torchaudio.save(
        f"se_bridge_tts_thai_zero_shot_{idx}.wav",
        output["tts_speech"],
        cosyvoice.sample_rate,
    )

Release Notes

This release package has been sanitized for public distribution. Internal server paths, private data paths, training-stage names, and operational configuration details are intentionally omitted. The repository does not describe per-stage checkpoint construction methods.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for isabeth/SE-Bridge-TTS