Mega-ASR-bf16 / README.md
beshkenadze's picture
Upload README.md with huggingface_hub
58b2aa4 verified
metadata
license: apache-2.0
library_name: mlx
tags:
  - mlx
  - speech-to-text
  - asr
  - robust-asr
  - qwen3-asr
base_model:
  - zhifeixie/Mega-ASR
  - Qwen/Qwen3-ASR-1.7B
language:
  - en
  - zh
pipeline_tag: automatic-speech-recognition

Mega-ASR-bf16

This model was converted to MLX format from zhifeixie/Mega-ASR (built on Qwen/Qwen3-ASR-1.7B) using mlx-audio.

Mega-ASR is a robustness layer over Qwen3-ASR-1.7B: a tiny audio-quality router classifies each utterance as clean or degraded and switches a dense LoRA adapter in/out of the base weights at inference — degraded audio runs the LoRA (robust) path, clean audio runs the unmodified base path. This recovers large WER gains on noisy/far-field speech while leaving clean-speech accuracy unchanged.

The base weights are stored as dense bf16 on purpose: Mega-ASR adds fp32 LoRA deltas to the base at inference, so the base cannot be quantized without losing the runtime router/LoRA switching.

Use with mlx-audio

pip install mlx-audio
from mlx_audio.stt import load

model = load("mlx-community/Mega-ASR-bf16")
result = model.generate("audio.wav", language="en")
print(result.text)

CLI:

python -m mlx_audio.stt.generate --model mlx-community/Mega-ASR-bf16 --audio audio.wav

The router decides per-utterance automatically; no flags needed.

Validation

Reproduces the paper's published robustness gains. Word Error Rate on the real NOIZEUS corpus (8 noise types × 4 SNR × 30 utterances, Apple Silicon):

SNR base (Qwen3-ASR) Mega-ASR (robust) paper base paper robust
0 dB 23.35 20.61 23.97 19.80
5 dB 8.47 6.51
10 dB 3.31 2.17 3.41 2.79
15 dB 2.12 0.83
overall 9.31 7.53 9.45 7.52

Overall robust WER 7.53 vs the paper's 7.52 — a ~20% relative reduction over the Qwen3-ASR baseline, reproduced. On clean read speech (FLEURS) the model matches plain Qwen3-ASR, as intended.

License & attribution

Apache-2.0. Built on zhifeixie/Mega-ASR (adapter + router) and Qwen/Qwen3-ASR-1.7B (base).