Is there any parameter to tune for more accurate diarization

#13

by lingxue156 - opened Jan 28

Jan 28

thank you guys for this excellent model which combine the semantic understanding and speaker diarization! I read your paper and noticed you used HDBSCAN clustering during pre-training with a fixed threshold of 0.67. Right now, I'm testing it in a meeting diarization scenario, and I've found the model leans a bit too conservative—it tends to identify fewer speakers than are actually present. Even when voices are pretty distinct (like different female speakers), they often end up lumped together.

So I was wondering: is the clustering threshold adjustable? And what other parameters could I tweak to make the diarization part a bit more aggressive? 😊

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment