3-pass diagnostic: naturalize utterances + add per-model raw transcripts vs ground truth (errors highlighted); correct the P2 English OOD finding 5e85c92 verified Luigi commited on about 8 hours ago
Add 3-pass phonemic diagnostic (zh 38/38, en 39/39, entities) + teacher/v1/v2 benchmark (X-ASR+Breeze) + diagnosis 6ccbdb4 verified Luigi commited on about 8 hours ago
Add pronunciation diagnostic: X-ASR vs Breeze-ASR-25 cross-check (teacher/v1-4.63M/v2-6.85M) on the entity sentence 42f03fa verified Luigi commited on about 8 hours ago
README: full v4 architecture/training/dataset + reproduction; v3 kept in v3_4.6M/ a69ea24 verified Luigi commited on about 11 hours ago
Keep best 4.63M (v3-30k) ONNX in v3_4.6M/ for record/rollback e6907ae verified Luigi commited on about 11 hours ago
README: v4-50k now default (6.85M, clarity+timing); v3 kept for rollback 703dbbc verified Luigi commited on about 11 hours ago
Default -> v4-50k (6.85M: +contextual-predictors, mrstft clarity, from-scratch); better timing+clarity by ear vs v3 441bf76 verified Luigi commited on about 11 hours ago
card: v3 pronunciation fix (ㄜ/ㄟ/ㄡ + ㄭ), zh-CER .106->.087 4c3ae82 verified Luigi commited on 1 day ago
Card: ship 60k (clarity PESQ 3.22 + zh-CER .106), update metrics 84e81f3 verified Luigi commited on 3 days ago
Clarity: ship 60k acoustic (sharper mel) — acoustic_decoder.onnx 75bb609 verified Luigi commited on 3 days ago
Clarity: ship 60k acoustic (sharper mel) — acoustic_encoder.onnx 0c9e00f verified Luigi commited on 3 days ago
Model card: 24kHz CC0 release, entity-coverage corpus (29.5k), updated metrics 5154a85 verified Luigi commited on 3 days ago
CC0 24kHz model: meta.json (acoustic 35k + vocoder final) c089244 verified Luigi commited on 3 days ago
CC0 24kHz model: vocoder.onnx (acoustic 35k + vocoder final) 7364c7a verified Luigi commited on 3 days ago
CC0 24kHz model: acoustic_decoder.onnx (acoustic 35k + vocoder final) 259414b verified Luigi commited on 3 days ago
CC0 24kHz model: acoustic_encoder.onnx (acoustic 35k + vocoder final) 60622b0 verified Luigi commited on 3 days ago
README: reference voice = Mozilla Common Voice zh-TW (CC0/public-domain, commercial-clear), replacing Edge-TTS 41a550e verified Luigi commited on 3 days ago
gen: pre-normalize text via text_norm (correct entity reading; teacher audio matches manifest) b5755f2 verified Luigi commited on 3 days ago
fix: aligner uses text_norm.normalize (entity normalizer) f8f98a9 verified Luigi commited on 3 days ago
reproduction: actual VoxCPM2-TW pipeline scripts + master run + eval set 361d565 verified Luigi commited on 3 days ago
reproduction: actual VoxCPM2-TW pipeline scripts + master run + eval set b1f6288 verified Luigi commited on 3 days ago
reproduction: actual VoxCPM2-TW pipeline scripts + master run + eval set d12d34e verified Luigi commited on 3 days ago
reproduction: actual VoxCPM2-TW pipeline scripts + master run + eval set 383648f verified Luigi commited on 3 days ago