VoxCPM-0.5B — Core AI (on-device, iPhone + Mac)

OpenBMB's VoxCPM-0.5B converted to Apple's Core AI engine, running fully on-device — iPhone (Apple Neural Engine / GPU, AOT-compiled) and Apple-silicon Mac. No network, no server.

VoxCPM is not a classic vocoder TTS: it pairs a MiniCPM4 language-model backbone with a LocDiT flow-matching diffusion head and an AudioVAE, generating speech through a continuous (token-rate) diffusion loop. This repo ships the whole stack as Core AI model bundles plus the small host-side glue the runtime needs.

Output: 16 kHz mono
License: Apache-2.0 (commercial-friendly), inherited from the base model
Quantization: weight-only int8 on the two LM backbones (the size driver); the diffusion decoder, feature encoder, and AudioVAE stay fp16 — the continuous-feedback path is quantization-sensitive (the same split mlx-community/VoxCPM2 uses).

Path	What
`macos/voxcpm_base_int8_decode_cl512/`	LM backbone (MiniCPM4, 24L), int8, static-KV decode — JIT `.aimodel` for Mac
`macos/voxcpm_res_int8_decode_cl512/`	Residual LM (6L), int8
`macos/voxcpm_feat_decoder_fp16/`	LocDiT CFM diffusion decoder (10-step euler + CFG, unrolled), fp16
`macos/voxcpm_feat_encoder_fp16/`	LocEnc + projection (per-frame feedback embed), fp16
`macos/voxcpm_vocoder_fp16_t12/`	AudioVAE decoder (DAC-style, 640× upsample), fp16
`ios/*.h18p.aimodelc/`	The same five bundles, AOT-compiled for iOS (h18p)
`voxcpm_host_glue/`	Token-embedding table + dit/FSQ/stop-head weights (run host-side via Accelerate)
`tokenizer/`	Llama tokenizer (`tokenizer.json` + config)

The two prefill bundles are intentionally not shipped: prefill runs through the q=1 decode bundle (it is causal, so step-by-step == batched), which also makes text length unbounded.

Usage

Easiest path is the coreai-model-zoo coreai-audio app (the "Voice" tab) and CoreAIKit:

import CoreAIKit

let tts = try await VoxCPMTTS(paths: .standard(artifactsRoot: modelRoot))   // macOS (.aimodel)
// let tts = try await VoxCPMTTS(paths: .aot(root: modelRoot, arch: "h18p")) // iOS (.aimodelc)
let pcm = try await tts.synthesize("On device speech synthesis, running entirely on your iPhone.")
// pcm: [Float] @ 16 kHz mono

The conversion scripts and the Swift host are in the zoo (conversion/voxcpm/) and CoreAIKit.

Notes

Plain TTS (fixed speaker). VoxCPM's voice-cloning branch is a follow-on.
Per-step quality is fp16-equivalent (int8 LM cos > 0.999 vs the fp32 reference); whole-utterance output is natural speech.
Community port — not an official Apple model.

Acknowledgements

OpenBMB / VoxCPM. Built on Apple's Core AI.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mlboydaisuke/VoxCPM-0.5B-CoreAI

Base model

openbmb/MiniCPM4-0.5B

Finetuned

openbmb/VoxCPM-0.5B

Finetuned

(7)

this model

mlboydaisuke
/

VoxCPM-0.5B-CoreAI

VoxCPM-0.5B — Core AI (on-device, iPhone + Mac)

Contents

Usage

Notes

Acknowledgements

Model tree for mlboydaisuke/VoxCPM-0.5B-CoreAI