MetaSICL: Adapting Audiroty LLM via Meta Speech In-Context Learning
Abstract
Audio large language models face challenges in low-resource settings, but in-context learning approaches can improve zero-shot performance through demonstration-based adaptation without direct fine-tuning.
Auditory Large Language Models (LLMs) have demonstrated strong performance across a wide range of speech and audio understanding tasks. Nevertheless, they often struggle when applied to low-resource tasks. In case in-domain labeled data are scarce or mismatched with the true test distribution, direct fine-tuning can be brittle. In-Context Learning (ICL) provides a training-free, inference-time solution by adapting auditory LLMs through conditioning on a few in-domain demonstrations. In this work, we first show that Vanilla ICL, improves zero-shot performance across diverse speech and audio tasks for selected models which suggest that this ICL adaptation capability can be generalized to multimodal setting. Building on this, we propose Meta Speech In-Context Learning (MetaSICL), a post-training recipe utilizes only high resource speech data from various tasks intending to strengthen model's in-context learning capability. Experiments indicate our proposed method outperforms direct fine-tuning in low-resource scenario.
Get this paper in your agent:
hf papers read 2601.18904 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper