MHA2MLA-VLM
Collection
The MHA2MLA-VLM model published in the paper "MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"
•
5 items
•
Updated
•
1
Research Paper: "MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"
This repository contains our proposed MD-SVD (Modality-Decoupled Singular Value Decomposition) initialization weights extracted from Stage 1 checkpoints for initializing Stage 2 MHA2MLA-VLM models, which independently compresses visual and textual KV spaces, enabling efficient compression while maintaining model performance.
| File Name | Latent Dimension (d_kv) |
|---|---|
Qwen2.5-VL-7B-rope32-d_kv_32.pt |
32 |
Qwen2.5-VL-7B-rope32-d_kv_64.pt |
64 |
Qwen2.5-VL-7B-rope32-d_kv_128.pt |
128 |
@misc{fan2026mha2mlavlmenablingdeepseekseconomical,
title={MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models},
author={Xiaoran Fan and Zhichao Sun and Tao Ji and Lixing Shen and Tao Gui},
year={2026},
eprint={2601.11464},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.11464},
}