Moshika MLX — Mixed Precision (q4 bulk + q8 sensitive)

Mixed-precision MLX checkpoint of Kyutai Moshika. Targets ~5 GB of weights, making the 7B Moshi practical on 8 GB Apple Silicon Macs while keeping the quality-critical layers at q8.

Quantization policy

Path pattern	Bits	Group
`text_emb`	8	64
`text_linear`	8	64
`audio_embs.*`	8	64
`depformer.*` (all Linear / Embedding)	8	64
*All `transformer.layers.` Linear / Embedding**	4	32
Norms, Mimi conv layers	—	—

The Mimi codec is not part of this checkpoint. Pair with the Mimi tokenizer from kyutai/moshika-mlx-bf16 (tokenizer-e351c8d8-checkpoint125.safetensors).

Memory (weights only)

Variant	Approx weights	Notes
BF16	~14 GB	Reference
q8 (group 64), uniform	~7.4 GB	`kyutai/moshika-mlx-q8`
mixed (this)	~5.0 GB	q4 bulk + q8 sensitive
q4 (group 32), uniform	~3.7 GB	`kyutai/moshika-mlx-q4`

Usage

moshi-swift GUI

Select Moshi q4/q8 mixed in the app's model picker. First run downloads to the HF cache.

moshi-swift CLI

moshi-cli run hf://strumecki/moshika-mlx-mp/model.mp.safetensors \
    --config moshi7b \
    --mimi-model hf://kyutai/moshika-mlx-bf16/tokenizer-e351c8d8-checkpoint125.safetensors \
    --input mic

Reproducibility

Produced from kyutai/moshika-mlx-bf16/model.safetensors using scripts/convert_mixed_precision.py in the moshi-swift fork. Two passes of mlx.nn.quantize with mutually-exclusive class_predicate filters.

Limitations

All 32 main-LM transformer layers are q4. Quality may differ from uniform q8 — subjective listening tests are recommended before relying on this for production.
Quantization is structural: the on-disk format records no bits/group_size metadata. Loading in any runtime other than moshi-swift requires applying the same predicate before update(parameters:).
macOS / Apple Silicon only (mlx-swift requirement).

Attribution

All credit for the underlying model architecture, training, and the Mimi codec to Kyutai and the Moshi paper.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for strumecki/moshika-mlx-mp

Base model

kyutai/moshika-mlx-bf16

Finetuned

(1)

this model

Paper for strumecki/moshika-mlx-mp

Moshi: a speech-text foundation model for real-time dialogue

Paper • 2410.00037 • Published Sep 17, 2024 • 16