Description

This repository contains our proposed MD-SVD (Modality-Decoupled Singular Value Decomposition) initialization weights extracted from Stage 1 checkpoints for initializing Stage 2 MHA2MLA-VLM models, which independently compresses visual and textual KV spaces, enabling efficient compression while maintaining model performance.

Available Weight Files

File Name	Latent Dimension (d_kv)
`Qwen2.5-VL-7B-rope32-d_kv_32.pt`	32
`Qwen2.5-VL-7B-rope32-d_kv_64.pt`	64
`Qwen2.5-VL-7B-rope32-d_kv_128.pt`	128

Citation

@misc{fan2026mha2mlavlmenablingdeepseekseconomical,
      title={MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models}, 
      author={Xiaoran Fan and Zhichao Sun and Tao Ji and Lixing Shen and Tao Gui},
      year={2026},
      eprint={2601.11464},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.11464}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including cnxup/SVD-Init

MHA2MLA-VLM

Collection

The MHA2MLA-VLM model published in the paper "MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models" • 5 items • Updated about 1 hour ago • 1

Paper for cnxup/SVD-Init

MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models

Paper • 2601.11464 • Published 7 days ago