VoxCPM2 - 4-bit quantized
MLX port of openbmb/VoxCPM2 โ a 2B-parameter multilingual TTS model with 48kHz studio-quality output, voice cloning, and voice design.
4-bit quantized (LM layers only, VAE/DiT at full precision). Fastest, smallest, with minimal quality loss.
Features
- 30 languages โ including English, Chinese, Indonesian, Japanese, Korean, and more
- 48kHz output โ studio-quality audio
- Voice Design โ create voices from text descriptions (no reference audio needed)
- Voice Cloning โ clone any voice from a short audio reference
- 4 generation modes โ zero-shot, continuation, reference cloning, combined
Usage
pip install mlx-audio
# Zero-shot
python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit --text "Hello world" --verbose
# Voice design
python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit \
--text "Hello world" \
--instruct "A young woman, gentle and sweet voice"
# Voice cloning
python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit \
--text "Hello world" \
--ref_audio speaker.wav --ref_text "reference text"
Python API
from mlx_audio.tts import load_model
model = load_model("mlx-community/VoxCPM2-4bit")
# Generate
for result in model.generate(
text="Hello, this is VoxCPM2 on Apple Silicon.",
inference_timesteps=7,
cfg_value=2.0,
):
print(f"Duration: {result.audio_duration}")
Performance (Apple Silicon)
| Variant | Size | RTF (7 timesteps) |
|---|---|---|
| bf16 | 4.96 GB | 0.48x |
| 8-bit | 3.23 GB | 0.85x |
| 4-bit | 2.30 GB | 0.90x |
RTF = Real-Time Factor (>1.0 = faster than realtime)
Original Model
- openbmb/VoxCPM2
- Apache 2.0 License
Converted with mlx-audio.
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for mlx-community/VoxCPM2-4bit
Base model
openbmb/VoxCPM2