mlx-community
/

Mega-ASR-6bit

Automatic Speech Recognition

Model card Files Files and versions

Mega-ASR-6bit / README.md

beshkenadze's picture

Upload folder using huggingface_hub

7a2f147 verified 5 days ago

|

history blame contribute delete

1.82 kB

	---
	license: apache-2.0
	library_name: mlx
	tags:
	- mlx
	- speech-to-text
	- asr
	- robust-asr
	- qwen3-asr
	base_model:
	- zhifeixie/Mega-ASR
	- Qwen/Qwen3-ASR-1.7B
	language:
	- en
	- zh
	pipeline_tag: automatic-speech-recognition
	---

	# Mega-ASR-6bit

	6-bit quantized robust-merged variant of [Mega-ASR](https://github.com/xzf-thu/Mega-ASR), in MLX format, for [mlx-audio](https://github.com/Blaizzy/mlx-audio).

	> No router — always-on robust. The Mega-ASR robustness LoRA is merged into the Qwen3-ASR-1.7B base and then quantized, so the per-utterance clean/degraded router is not present (you cannot add fp32 LoRA deltas to quantized weights). This model always runs the robust path.
	>
	> For the full dynamic Mega-ASR — clean speech on the base path, noisy speech on the LoRA path — use [`mlx-community/Mega-ASR-bf16`](https://huggingface.co/mlx-community/Mega-ASR-bf16).
	>
	> Use this 6-bit variant for noisy-only / memory-constrained deployments: ~2 GB and ~4× faster than the dynamic model (no per-clip LoRA toggling).

	## Use with mlx-audio

	```bash
	pip install mlx-audio
	```

	```python
	from mlx_audio.stt import load

	model = load("mlx-community/Mega-ASR-6bit")
	result = model.generate("audio.wav", language="en")
	print(result.text)
	```

	## Quality

	6-bit is effectively lossless versus bf16 on noisy speech. WER on a NOIZEUS subset (merged-robust path):

	\| Precision \| overall WER \| size \|
	\|---\|---:\|---:\|
	\| bf16 \| 7.95 \| 4.08 GB \|
	\| 6-bit (this model) \| 7.89 \| 2.04 GB \|
	\| 8-bit \| 8.06 \| 2.47 GB \|

	(4-bit degrades to 10.78 WER and is not published.)

	## License & attribution

	Apache-2.0. Built on [zhifeixie/Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (adapter + router) and [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) (base).