Qwen3.5 MTP Weights for tp906
Pre-extracted Multi-Token Prediction (MTP) sidecar weights for Qwen3.5 Mamba2-hybrid models.
Used by tp906-engine for speculative decoding (~10-15% decode speedup).
Files
| Model | File | Size |
|---|---|---|
| Qwen3.5-0.8B | Qwen3.5-0.8B/mtp_weights.bin |
39 MB |
| Qwen3.5-2B | Qwen3.5-2B/mtp_weights.bin |
116 MB |
| Qwen3.5-4B | Qwen3.5-4B/mtp_weights.bin |
230 MB |
| Qwen3.5-9B | Qwen3.5-9B/mtp_weights.bin |
465 MB |
| Qwen3.5-27B | Qwen3.5-27B/mtp_weights.bin |
811 MB |
Usage
Download the file matching your model and place it next to your GGUF file:
# Example: Qwen3.5-9B
cd /path/to/your/models/
wget https://huggingface.co/raspbfox/tp906-mtp/resolve/main/Qwen3.5-9B/mtp_weights.bin
# Your directory should look like:
# Qwen3.5-9B-Q8_0.gguf
# mtp_weights.bin <-- tp906 auto-detects this
# Run with MTP
tp906-bench -m Qwen3.5-9B-Q8_0.gguf
tp906 auto-detects mtp_weights.bin in the same directory as the GGUF model. No flags needed.
Format
MTP1 binary format (F16 tensors). 15 tensors per file extracted from the official Qwen3.5 safetensors (model.mtp_block.* weights). BF16 source tensors are converted to F16 for MI50 compatibility.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support