Pyannote Segmentation 3.0 β€” MLX

MLX-compatible weights for pyannote/segmentation-3.0 (PyanNet), converted from the official PyTorch Lightning checkpoint with pre-computed SincNet filters.

Model

PyanNet is a speaker segmentation model (~1.5M params) that processes 10-second audio windows and outputs 7-class powerset probabilities for up to 3 simultaneous speakers. Used for both voice activity detection (binary) and speaker diarization (per-speaker).

Architecture: SincNet β†’ BiLSTM(4 layers) β†’ Linear(2 layers) β†’ 7-class softmax

Output classes: non-speech, spk1, spk2, spk3, spk1+2, spk1+3, spk2+3

Usage (Swift / MLX)

import SpeechVAD

// Voice Activity Detection
let vad = try await PyannoteVADModel.fromPretrained()
let segments = vad.detectSpeech(audio: samples, sampleRate: 16000)
for seg in segments {
    print("Speech: \(seg.startTime)s - \(seg.endTime)s")
}

// Speaker Diarization (with WeSpeaker embeddings)
let pipeline = try await DiarizationPipeline.fromPretrained()
let result = pipeline.diarize(audio: samples, sampleRate: 16000)
for seg in result.segments {
    print("Speaker \(seg.speakerId): \(seg.startTime)s - \(seg.endTime)s")
}

Part of qwen3-asr-swift.

Conversion

python3 scripts/convert_pyannote.py --token YOUR_HF_TOKEN --upload

Converts the gated pyannote/segmentation-3.0 checkpoint using a custom unpickler (no pyannote.audio dependency required). Key transformations:

  • SincNet: pre-compute 80 sinc bandpass filters (40 cos + 40 sin) from 40 learned (low_hz, band_hz) parameter pairs
  • Conv1d: transpose weights [O, I, K] β†’ [O, K, I] for MLX channels-last
  • BiLSTM: split into forward/backward stacks, sum bias_ih + bias_hh
  • Linear/classifier: kept as-is

Weight Mapping

PyTorch Key MLX Key Shape
sincnet.conv1d.0.filterbank.* (computed) sincnet.conv.0.weight [80, 251, 1]
sincnet.conv1d.{1,2}.weight sincnet.conv.{1,2}.weight [O, K, I]
sincnet.norm1d.{0-2}.* sincnet.norm.{0-2}.* varies
lstm.weight_ih_l{i} lstm_fwd.layers.{i}.Wx [512, I]
lstm.weight_hh_l{i} lstm_fwd.layers.{i}.Wh [512, 128]
lstm.bias_ih_l{i} + bias_hh_l{i} lstm_fwd.layers.{i}.bias [512]
lstm.*_reverse lstm_bwd.layers.{i}.* same
linear.{0,1}.* linear.{0,1}.* varies
classifier.* classifier.* [7, 128]

License

The original pyannote segmentation model is released under the MIT License.

Downloads last month
41
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aufklarer/Pyannote-Segmentation-MLX

Finetuned
(87)
this model