MedM-VL
Collection
Model weights for 2D/3D medical LVLMs
•
3 items
•
Updated
•
1
A 3D medical LVLM trained on 3D chest CT volumes and English medical texts (CT-RATE), enabling tasks such as report generation and medical VQA.
| Config | |
|---|---|
| Image encoder | google/siglip-base-patch16-256-multilingual |
| Connector | Cross-Attention + MLP (2-layer) |
| LLM | Qwen/Qwen2.5-3B-Instruct |
| Image resolution | 32*256*256 |
| Sequence length | 2048 |
| Task | CT-CHAT | MedM-VL-CT-Chest (3D) | MedM-VL-CT-Chest (2D+Avg) | MedM-VL-CT-Chest (2D+Attn) |
|---|---|---|---|---|
| Long answer | 0.482 | 0.619 | 0.622 | 0.623 |
| Short answer | 0.274 | 0.658 | 0.664 | 0.667 |
| Multiple choice | 0.838 | 0.924 | 0.920 | 0.925 |
| Report generation | 0.395 | 0.419 | 0.441 | 0.439 |
Please refer to MedM-VL.
@inproceedings{shi2025medm,
title={Medm-vl: What makes a good medical lvlm?},
author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Fu, Xiangling and Li, Miao and Wu, Ji},
booktitle={International Workshop on Agentic AI for Medicine},
pages={290--299},
year={2025},
organization={Springer}
}