Moshika MLX β€” Mixed Precision (q4 bulk + q8 sensitive)

Mixed-precision MLX checkpoint of Kyutai Moshika. Targets ~5 GB of weights, making the 7B Moshi practical on 8 GB Apple Silicon Macs while keeping the quality-critical layers at q8.

Quantization policy

Path pattern Bits Group
text_emb 8 64
text_linear 8 64
audio_embs.* 8 64
depformer.* (all Linear / Embedding) 8 64
All transformer.layers.* Linear / Embedding 4 32
Norms, Mimi conv layers β€” β€”

The Mimi codec is not part of this checkpoint. Pair with the Mimi tokenizer from kyutai/moshika-mlx-bf16 (tokenizer-e351c8d8-checkpoint125.safetensors).

Memory (weights only)

Variant Approx weights Notes
BF16 ~14 GB Reference
q8 (group 64), uniform ~7.4 GB kyutai/moshika-mlx-q8
mixed (this) ~5.0 GB q4 bulk + q8 sensitive
q4 (group 32), uniform ~3.7 GB kyutai/moshika-mlx-q4

Usage

moshi-swift GUI

Select Moshi q4/q8 mixed in the app's model picker. First run downloads to the HF cache.

moshi-swift CLI

moshi-cli run hf://strumecki/moshika-mlx-mp/model.mp.safetensors \
    --config moshi7b \
    --mimi-model hf://kyutai/moshika-mlx-bf16/tokenizer-e351c8d8-checkpoint125.safetensors \
    --input mic

Reproducibility

Produced from kyutai/moshika-mlx-bf16/model.safetensors using scripts/convert_mixed_precision.py in the moshi-swift fork. Two passes of mlx.nn.quantize with mutually-exclusive class_predicate filters.

Limitations

  • All 32 main-LM transformer layers are q4. Quality may differ from uniform q8 β€” subjective listening tests are recommended before relying on this for production.
  • Quantization is structural: the on-disk format records no bits/group_size metadata. Loading in any runtime other than moshi-swift requires applying the same predicate before update(parameters:).
  • macOS / Apple Silicon only (mlx-swift requirement).

Attribution

All credit for the underlying model architecture, training, and the Mimi codec to Kyutai and the Moshi paper.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for strumecki/moshika-mlx-mp

Finetuned
(1)
this model

Paper for strumecki/moshika-mlx-mp