Contrastive Regularization for Accent-Robust ASR
Abstract
Supervised contrastive learning serves as an effective regularization method to enhance accent robustness in ASR systems without requiring architectural changes or explicit accent labels.
ASR systems based on self-supervised acoustic pretraining and CTC fine-tuning achieve strong performance on native speech but remain sensitive to accent variability. We investigate supervised contrastive learning (SupCon) as a lightweight, accent-invariant auxiliary objective for CTC fine-tuning. An utterance-level contrastive loss regularizes encoder representations without architectural modification or explicit accent supervision. Experiments on the L2-ARCTIC benchmark show consistent WER reductions across multiple pretrained encoders, with up to 25 -- 29\% relative reduction under unseen-accent evaluation. Analysis using within-transcript cosine dispersion indicates that SupCon promotes more compact and stable representation geometry under accent variability. Overall, SupCon provides an effective and model-agnostic regularization strategy for improving accent robustness.
Get this paper in your agent:
hf papers read 2605.03297 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 8
ThaiVanPhat95/wav2vec2-robust-uwb-supcon-hybrid-4gram
Datasets citing this paper 1
ThaiVanPhat95/synthetic-atc-speech
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper