File size: 1,820 Bytes

7a2f147

---
license: apache-2.0
library_name: mlx
tags:
  - mlx
  - speech-to-text
  - asr
  - robust-asr
  - qwen3-asr
base_model:
  - zhifeixie/Mega-ASR
  - Qwen/Qwen3-ASR-1.7B
language:
  - en
  - zh
pipeline_tag: automatic-speech-recognition
---

# Mega-ASR-6bit

6-bit quantized **robust-merged** variant of [Mega-ASR](https://github.com/xzf-thu/Mega-ASR), in MLX format, for [mlx-audio](https://github.com/Blaizzy/mlx-audio).

> **No router — always-on robust.** The Mega-ASR robustness LoRA is **merged** into the Qwen3-ASR-1.7B base and then quantized, so the per-utterance clean/degraded **router is not present** (you cannot add fp32 LoRA deltas to quantized weights). This model always runs the robust path.
>
> For the **full dynamic Mega-ASR** — clean speech on the base path, noisy speech on the LoRA path — use [`mlx-community/Mega-ASR-bf16`](https://huggingface.co/mlx-community/Mega-ASR-bf16).
>
> Use this 6-bit variant for **noisy-only / memory-constrained** deployments: ~2 GB and ~4× faster than the dynamic model (no per-clip LoRA toggling).

## Use with mlx-audio

```bash
pip install mlx-audio
```

```python
from mlx_audio.stt import load

model = load("mlx-community/Mega-ASR-6bit")
result = model.generate("audio.wav", language="en")
print(result.text)
```

## Quality

6-bit is effectively **lossless** versus bf16 on noisy speech. WER on a NOIZEUS subset (merged-robust path):

| Precision | overall WER | size |
|---|---:|---:|
| bf16 | 7.95 | 4.08 GB |
| **6-bit (this model)** | **7.89** | 2.04 GB |
| 8-bit | 8.06 | 2.47 GB |

(4-bit degrades to 10.78 WER and is not published.)

## License & attribution

Apache-2.0. Built on [zhifeixie/Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (adapter + router) and [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) (base).