Automatic Speech Recognition
MLX
Safetensors
English
Chinese
qwen3_asr
speech-to-text
asr
robust-asr
qwen3-asr
6-bit
Instructions to use mlx-community/Mega-ASR-6bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Mega-ASR-6bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Mega-ASR-6bit mlx-community/Mega-ASR-6bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
| license: apache-2.0 | |
| library_name: mlx | |
| tags: | |
| - mlx | |
| - speech-to-text | |
| - asr | |
| - robust-asr | |
| - qwen3-asr | |
| base_model: | |
| - zhifeixie/Mega-ASR | |
| - Qwen/Qwen3-ASR-1.7B | |
| language: | |
| - en | |
| - zh | |
| pipeline_tag: automatic-speech-recognition | |
| # Mega-ASR-6bit | |
| 6-bit quantized **robust-merged** variant of [Mega-ASR](https://github.com/xzf-thu/Mega-ASR), in MLX format, for [mlx-audio](https://github.com/Blaizzy/mlx-audio). | |
| > **No router — always-on robust.** The Mega-ASR robustness LoRA is **merged** into the Qwen3-ASR-1.7B base and then quantized, so the per-utterance clean/degraded **router is not present** (you cannot add fp32 LoRA deltas to quantized weights). This model always runs the robust path. | |
| > | |
| > For the **full dynamic Mega-ASR** — clean speech on the base path, noisy speech on the LoRA path — use [`mlx-community/Mega-ASR-bf16`](https://huggingface.co/mlx-community/Mega-ASR-bf16). | |
| > | |
| > Use this 6-bit variant for **noisy-only / memory-constrained** deployments: ~2 GB and ~4× faster than the dynamic model (no per-clip LoRA toggling). | |
| ## Use with mlx-audio | |
| ```bash | |
| pip install mlx-audio | |
| ``` | |
| ```python | |
| from mlx_audio.stt import load | |
| model = load("mlx-community/Mega-ASR-6bit") | |
| result = model.generate("audio.wav", language="en") | |
| print(result.text) | |
| ``` | |
| ## Quality | |
| 6-bit is effectively **lossless** versus bf16 on noisy speech. WER on a NOIZEUS subset (merged-robust path): | |
| | Precision | overall WER | size | | |
| |---|---:|---:| | |
| | bf16 | 7.95 | 4.08 GB | | |
| | **6-bit (this model)** | **7.89** | 2.04 GB | | |
| | 8-bit | 8.06 | 2.47 GB | | |
| (4-bit degrades to 10.78 WER and is not published.) | |
| ## License & attribution | |
| Apache-2.0. Built on [zhifeixie/Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (adapter + router) and [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) (base). | |