arxiv:2605.03297

Contrastive Regularization for Accent-Robust ASR

Published on May 5

Authors:

Abstract

Supervised contrastive learning serves as an effective regularization method to enhance accent robustness in ASR systems without requiring architectural changes or explicit accent labels.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

ASR systems based on self-supervised acoustic pretraining and CTC fine-tuning achieve strong performance on native speech but remain sensitive to accent variability. We investigate supervised contrastive learning (SupCon) as a lightweight, accent-invariant auxiliary objective for CTC fine-tuning. An utterance-level contrastive loss regularizes encoder representations without architectural modification or explicit accent supervision. Experiments on the L2-ARCTIC benchmark show consistent WER reductions across multiple pretrained encoders, with up to 25 -- 29\% relative reduction under unseen-accent evaluation. Analysis using within-transcript cosine dispersion indicates that SupCon promotes more compact and stable representation geometry under accent variability. Overall, SupCon provides an effective and model-agnostic regularization strategy for improving accent robustness.