Pyannote Segmentation 3.0 β MLX
MLX-compatible weights for pyannote/segmentation-3.0 (PyanNet), converted from the official PyTorch Lightning checkpoint with pre-computed SincNet filters.
Model
PyanNet is a speaker segmentation model (~1.5M params) that processes 10-second audio windows and outputs 7-class powerset probabilities for up to 3 simultaneous speakers. Used for both voice activity detection (binary) and speaker diarization (per-speaker).
Architecture: SincNet β BiLSTM(4 layers) β Linear(2 layers) β 7-class softmax
Output classes: non-speech, spk1, spk2, spk3, spk1+2, spk1+3, spk2+3
Usage (Swift / MLX)
import SpeechVAD
// Voice Activity Detection
let vad = try await PyannoteVADModel.fromPretrained()
let segments = vad.detectSpeech(audio: samples, sampleRate: 16000)
for seg in segments {
print("Speech: \(seg.startTime)s - \(seg.endTime)s")
}
// Speaker Diarization (with WeSpeaker embeddings)
let pipeline = try await DiarizationPipeline.fromPretrained()
let result = pipeline.diarize(audio: samples, sampleRate: 16000)
for seg in result.segments {
print("Speaker \(seg.speakerId): \(seg.startTime)s - \(seg.endTime)s")
}
Part of qwen3-asr-swift.
Conversion
python3 scripts/convert_pyannote.py --token YOUR_HF_TOKEN --upload
Converts the gated pyannote/segmentation-3.0 checkpoint using a custom unpickler (no pyannote.audio dependency required). Key transformations:
- SincNet: pre-compute 80 sinc bandpass filters (40 cos + 40 sin) from 40 learned
(low_hz, band_hz)parameter pairs - Conv1d: transpose weights
[O, I, K]β[O, K, I]for MLX channels-last - BiLSTM: split into forward/backward stacks, sum
bias_ih + bias_hh - Linear/classifier: kept as-is
Weight Mapping
| PyTorch Key | MLX Key | Shape |
|---|---|---|
sincnet.conv1d.0.filterbank.* (computed) |
sincnet.conv.0.weight |
[80, 251, 1] |
sincnet.conv1d.{1,2}.weight |
sincnet.conv.{1,2}.weight |
[O, K, I] |
sincnet.norm1d.{0-2}.* |
sincnet.norm.{0-2}.* |
varies |
lstm.weight_ih_l{i} |
lstm_fwd.layers.{i}.Wx |
[512, I] |
lstm.weight_hh_l{i} |
lstm_fwd.layers.{i}.Wh |
[512, 128] |
lstm.bias_ih_l{i} + bias_hh_l{i} |
lstm_fwd.layers.{i}.bias |
[512] |
lstm.*_reverse |
lstm_bwd.layers.{i}.* |
same |
linear.{0,1}.* |
linear.{0,1}.* |
varies |
classifier.* |
classifier.* |
[7, 128] |
License
The original pyannote segmentation model is released under the MIT License.
- Downloads last month
- 41
Quantized
Model tree for aufklarer/Pyannote-Segmentation-MLX
Base model
pyannote/segmentation-3.0