Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Fabio Augusto Suizu
PRO
fabiosuizu
Follow
denisfitz's profile picture
robtacconelli's profile picture
2 followers
·
0 following
AI & ML interests
None yet
Recent Activity
posted
an
update
6 days ago
Open Pronunciation Assessment API — 17MB model, sub-300ms, phoneme-level scoring Hi everyone! I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback. **What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme. **Key specs**: - 17MB total model size — runs entirely on CPU - 257ms median inference latency - Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%) - Benchmarked on standard academic datasets (2,500+ test utterances) - Validated across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian) **Architecture**: Proprietary ML pipeline optimized for pronunciation assessment. The entire engine runs in 17MB — no GPU required, no large foundation models needed. **Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes. **API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description. Would love feedback on: 1. Use cases you'd find this useful for 2. Languages you'd want supported next 3. Whether the scoring feels calibrated for your experience level Thanks!https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment
updated
a Space
7 days ago
fabiosuizu/pronunciation-assessment
posted
an
update
9 days ago
Hi everyone! I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback. **What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme. **Key specs**: - 17MB total model size (NeMo Citrinet-256, INT4 quantized) - 257ms median inference on CPU - Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%) - Benchmarked on speechocean762 (2,500 test utterances) - Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian) **Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB. **Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes. **API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description. Would love feedback on: 1. Use cases you'd find this useful for 2. Languages you'd want supported next 3. Whether the scoring feels calibrated for your experience level Thanks!
View all activity
Organizations
None yet
fabiosuizu
's Spaces
1
Sort: Recently updated
pinned
Running
4
Speech AI
🎙
Pronunciation scoring + Speech-to-Text + Text-to-Speech