aufklarer (Ivan)

posted an update 28 minutes ago

Post

5

Speaker Diarization and VAD on Apple Silicon — MLX-Native Models

Three MLX-optimized models for on-device speaker diarization and voice activity detection, running natively on Apple Silicon via https://github.com/ivan-digital/qwen3-asr-swift:

- aufklarer/Silero-VAD-v5-MLX — Streaming VAD, 309K params, ~1.2 MB. Processes 32ms chunks at 23× real-time on M2 Max.
- aufklarer/Pyannote-Segmentation-MLX — Multi-speaker segmentation, ~1.49M params, ~5.7 MB. 7-class powerset output for up to 3 simultaneous speakers.
- aufklarer/WeSpeaker-ResNet34-LM-MLX — Speaker embedding, ~6.6M params, ~25 MB. 256-dim L2-normalized vectors with BatchNorm fused into Conv2d.

Together they form a diarization pipeline: pyannote segments → WeSpeaker embeds → agglomerative clustering links speakers across the recording. ~32 MB total.

git clone https://github.com/ivan-digital/qwen3-asr-swift
cd qwen3-asr-swift && swift build -c release

.build/release/audio diarize meeting.wav --max-speakers 4 --json
.build/release/audio vad-stream recording.wav

The library also includes ASR, TTS, multilingual synthesis, forced alignment, and speech-to-speech (PersonaPlex 7B). Apache 2.0.

Full architecture details: https://blog.ivan.digital/speaker-diarization-and-voice-activity-detection-on-apple-silicon-native-swift-with-mlx

Library: https://github.com/ivan-digital/qwen3-asr-swift

updated a dataset about 2 hours ago

aufklarer/central-bank-communications

Viewer • Updated about 2 hours ago • 252k • 800 • 3

updated a model 1 day ago

aufklarer/Pyannote-Segmentation-MLX

Voice Activity Detection • Updated 1 day ago • 17

updated a model 2 days ago

aufklarer/WeSpeaker-ResNet34-LM-MLX

Audio Classification • Updated 1 day ago • 21

published a model 2 days ago

aufklarer/WeSpeaker-ResNet34-LM-MLX

Audio Classification • Updated 1 day ago • 21

updated a model 2 days ago

aufklarer/Silero-VAD-v5-MLX

Voice Activity Detection • Updated 2 days ago • 14

published 2 models 2 days ago

aufklarer/Silero-VAD-v5-MLX

Voice Activity Detection • Updated 2 days ago • 14

aufklarer/Pyannote-Segmentation-MLX

Voice Activity Detection • Updated 1 day ago • 17

updated a model 3 days ago

aufklarer/Qwen3-ForcedAligner-0.6B-4bit

Audio Classification • Updated 2 days ago • 21

published a model 3 days ago

aufklarer/Qwen3-ForcedAligner-0.6B-4bit

Audio Classification • Updated 2 days ago • 21

posted an update 3 days ago

Post

2486

PersonaPlex-7B on Apple Silicon (Swift + MLX Swift)

NVIDIA PersonaPlex is a full-duplex speech-to-speech model — it can listen while it speaks, which enables more natural conversational behaviors like interruptions, overlaps, and quick backchannels.

We put together a native Swift implementation using MLX Swift so it can run locally on Apple Silicon, along with a 4-bit MLX conversion and a small CLI/demo to make it easy to try out.

If you’re interested in on-device voice agents (or just want to see what full-duplex S2S looks like in a real Swift codebase), the details and setup notes are here:

Blog post: https://blog.ivan.digital/nvidia-personaplex-7b-on-apple-silicon-full-duplex-speech-to-speech-in-native-swift-with-mlx-0aa5276f2e23

Repo: https://github.com/ivan-digital/qwen3-asr-swift

updated a model 3 days ago

aufklarer/PersonaPlex-7B-MLX-4bit

Audio-to-Audio • Updated 3 days ago • 89

updated a model 5 days ago

aufklarer/CosyVoice3-0.5B-MLX-4bit

Text-to-Speech • Updated 5 days ago • 56

published a model 5 days ago

aufklarer/PersonaPlex-7B-MLX-4bit

Audio-to-Audio • Updated 3 days ago • 89

published a model 8 days ago

aufklarer/CosyVoice3-0.5B-MLX-4bit

Text-to-Speech • Updated 5 days ago • 56

reacted to their post with 🔥 20 days ago

Post

3450

Context Engineering for Code Agents: Why They Fail and How to Fix Them

Code agents don't fail because they can't code — they fail because their context turns into a junk drawer.

I wrote a practical survey covering the emerging discipline of context engineering for agentic hybrid applications: the techniques, papers, and architectural patterns that keep long-running code agents on track as their token windows fill up with tool logs, stale diffs, and repeated file dumps.
What's covered:

Why long context windows alone don't save you (position bias, distractor sensitivity)
Observation masking vs. LLM summarization — and when simple beats clever
Tool-output compression with approaches like LLMLingua-2
Trajectory reduction: pruning dead branches from agent history
Memory hierarchies: session → working set → notes → cross-session
How MCP and standardized tool interfaces reduce context debt
Dynamic context policies trained with RL (DeepMiner, MEM1)
Meta-agent CI loops for measuring regressions across agent configs

The core argument: the engineering challenge isn't "make the model smarter" — it's make the agent's context and verification smarter. That's where the real leverage is in 2026.

👉 Read the full post: https://blog.ivan.digital/context-engineering-for-agentic-hybrid-applications-why-code-agents-fail-and-how-to-fix-them-076cab699262

2 replies

·

posted an update 20 days ago

Post

3450

Context Engineering for Code Agents: Why They Fail and How to Fix Them

Code agents don't fail because they can't code — they fail because their context turns into a junk drawer.

I wrote a practical survey covering the emerging discipline of context engineering for agentic hybrid applications: the techniques, papers, and architectural patterns that keep long-running code agents on track as their token windows fill up with tool logs, stale diffs, and repeated file dumps.
What's covered:

Why long context windows alone don't save you (position bias, distractor sensitivity)
Observation masking vs. LLM summarization — and when simple beats clever
Tool-output compression with approaches like LLMLingua-2
Trajectory reduction: pruning dead branches from agent history
Memory hierarchies: session → working set → notes → cross-session
How MCP and standardized tool interfaces reduce context debt
Dynamic context policies trained with RL (DeepMiner, MEM1)
Meta-agent CI loops for measuring regressions across agent configs

The core argument: the engineering challenge isn't "make the model smarter" — it's make the agent's context and verification smarter. That's where the real leverage is in 2026.

👉 Read the full post: https://blog.ivan.digital/context-engineering-for-agentic-hybrid-applications-why-code-agents-fail-and-how-to-fix-them-076cab699262

2 replies

·

posted an update 23 days ago

Post

785

Qwen3-ASR Swift: On-Device Speech Recognition for Apple Silicon

I'm excited to release https://github.com/ivan-digital/qwen3-asr-swift, an open-source Swift implementation of Alibaba's
Qwen3-ASR, optimized for Apple Silicon using MLX.

Why Qwen3-ASR? Exceptional noise robustness — 3.5x better than Whisper in noisy conditions (17.9% vs 63% CER).

Features:
- 52 languages (30 major + 22 Chinese dialects)
- ~600MB model (4-bit quantized)
- ~100ms latency on M-series chips
- Fully local, no cloud API

Also more inference and model architecture in blog post https://blog.ivan.digital/qwen3-asr-swift-on-device-asr-tts-for-apple-silicon-architecture-and-benchmarks-27cbf1e4463f

updated a dataset 24 days ago

aufklarer/central-bank-communications

Viewer • Updated about 2 hours ago • 252k • 800 • 3

updated a dataset 25 days ago

aufklarer/central-bank-communications

Viewer • Updated about 2 hours ago • 252k • 800 • 3

Ivan PRO

AI & ML interests

Recent Activity

Organizations

aufklarer/central-bank-communications

aufklarer/Pyannote-Segmentation-MLX

aufklarer/WeSpeaker-ResNet34-LM-MLX

aufklarer/WeSpeaker-ResNet34-LM-MLX

aufklarer/Silero-VAD-v5-MLX

aufklarer/Silero-VAD-v5-MLX

aufklarer/Pyannote-Segmentation-MLX

aufklarer/Qwen3-ForcedAligner-0.6B-4bit

aufklarer/Qwen3-ForcedAligner-0.6B-4bit

aufklarer/PersonaPlex-7B-MLX-4bit

aufklarer/CosyVoice3-0.5B-MLX-4bit

aufklarer/PersonaPlex-7B-MLX-4bit

aufklarer/CosyVoice3-0.5B-MLX-4bit

aufklarer/central-bank-communications

aufklarer/central-bank-communications

Ivan PRO

AI & ML interests

Recent Activity

Organizations

aufklarer's activity