Qwen3.5 MTP Weights for tp906

Pre-extracted Multi-Token Prediction (MTP) sidecar weights for Qwen3.5 Mamba2-hybrid models.

Used by tp906-engine for speculative decoding (~10-15% decode speedup).

Files

Model	File	Size
Qwen3.5-0.8B	`Qwen3.5-0.8B/mtp_weights.bin`	39 MB
Qwen3.5-2B	`Qwen3.5-2B/mtp_weights.bin`	116 MB
Qwen3.5-4B	`Qwen3.5-4B/mtp_weights.bin`	230 MB
Qwen3.5-9B	`Qwen3.5-9B/mtp_weights.bin`	465 MB
Qwen3.5-27B	`Qwen3.5-27B/mtp_weights.bin`	811 MB

Usage

Download the file matching your model and place it next to your GGUF file:

# Example: Qwen3.5-9B
cd /path/to/your/models/
wget https://huggingface.co/raspbfox/tp906-mtp/resolve/main/Qwen3.5-9B/mtp_weights.bin

# Your directory should look like:
#   Qwen3.5-9B-Q8_0.gguf
#   mtp_weights.bin          <-- tp906 auto-detects this

# Run with MTP
tp906-bench -m Qwen3.5-9B-Q8_0.gguf

tp906 auto-detects mtp_weights.bin in the same directory as the GGUF model. No flags needed.

Format

MTP1 binary format (F16 tensors). 15 tensors per file extracted from the official Qwen3.5 safetensors (model.mtp_block.* weights). BF16 source tensors are converted to F16 for MI50 compatibility.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support