Qwen3.5 MTP Weights for tp906

Pre-extracted Multi-Token Prediction (MTP) sidecar weights for Qwen3.5 Mamba2-hybrid models.

Used by tp906-engine for speculative decoding (~10-15% decode speedup).

Files

Model File Size
Qwen3.5-0.8B Qwen3.5-0.8B/mtp_weights.bin 39 MB
Qwen3.5-2B Qwen3.5-2B/mtp_weights.bin 116 MB
Qwen3.5-4B Qwen3.5-4B/mtp_weights.bin 230 MB
Qwen3.5-9B Qwen3.5-9B/mtp_weights.bin 465 MB
Qwen3.5-27B Qwen3.5-27B/mtp_weights.bin 811 MB

Usage

Download the file matching your model and place it next to your GGUF file:

# Example: Qwen3.5-9B
cd /path/to/your/models/
wget https://huggingface.co/raspbfox/tp906-mtp/resolve/main/Qwen3.5-9B/mtp_weights.bin

# Your directory should look like:
#   Qwen3.5-9B-Q8_0.gguf
#   mtp_weights.bin          <-- tp906 auto-detects this

# Run with MTP
tp906-bench -m Qwen3.5-9B-Q8_0.gguf

tp906 auto-detects mtp_weights.bin in the same directory as the GGUF model. No flags needed.

Format

MTP1 binary format (F16 tensors). 15 tensors per file extracted from the official Qwen3.5 safetensors (model.mtp_block.* weights). BF16 source tensors are converted to F16 for MI50 compatibility.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support