Papers
arxiv:2603.11409

Speak or Stay Silent: Context-Aware Turn-Taking in Multi-Party Dialogue

Published on Mar 12
Authors:
,
,
,

Abstract

Context-aware turn-taking in multi-party conversations requires explicit training rather than emergent capabilities, as demonstrated by a benchmark of 120K labeled conversations and improved fine-tuning approaches.

AI-generated summary

Existing voice AI assistants treat every detected pause as an invitation to speak. This works in dyadic dialogue, but in multi-party settings, where an AI assistant participates alongside multiple speakers, pauses are abundant and ambiguous. An assistant that speaks on every pause becomes disruptive rather than useful. In this work, we formulate context-aware turn-taking: at every detected pause, given the full conversation context, our method decides whether the assistant should speak or stay silent. We introduce a benchmark of over 120K labeled conversations spanning three multi-party corpora. Evaluating eight recent large language models, we find that they consistently fail at context-aware turn-taking under zero-shot prompting. We then propose a supervised fine-tuning approach with reasoning traces, improving balanced accuracy by up to 23 percentage points. Our findings suggest that context-aware turn-taking is not an emergent capability; it must be explicitly trained.

Community

Sign up or log in to comment

Models citing this paper 21

Browse 21 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.11409 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.