speaker diarization less 1s seems not good

by linlinsong - opened Jan 22

Jan 22

Excellent work, thank u! the timestamps are very precise, and the speaker diarization is also quite good. However, it doesn't seem to be very accurate, that the speaker diarization for brief interjections of approximately one second within sentences. Would it be possible to address this issue by adjusting the inference parameters?

WestZhang

Jan 22

Thanks for your interest!

Short, frequent speaker interchanges (e.g., ~1s interjections within an ongoing sentence) are still challenging for the current model and typically require targeted training data to improve. From an inference-only perspective, parameter tuning is unlikely to reliably fix this issue at the moment—sorry about that. o(╥﹏╥)o

That said, we’re actively iterating on the model, and upcoming versions should deliver better performance for this scenario. ヾ(◍°∇°◍)ﾉﾞ

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment