MedM-VL-CT-Chest-3B-en

Introduction

A 3D medical LVLM trained on 3D chest CT volumes and English medical texts (CT-RATE), enabling tasks such as report generation and medical VQA.

Config
Image encoder google/siglip-base-patch16-256-multilingual
Connector Cross-Attention + MLP (2-layer)
LLM Qwen/Qwen2.5-3B-Instruct
Image resolution 32*256*256
Sequence length 2048

Evaluation

Task CT-CHAT MedM-VL-CT-Chest (3D) MedM-VL-CT-Chest (2D+Avg) MedM-VL-CT-Chest (2D+Attn)
Long answer 0.482 0.619 0.622 0.623
Short answer 0.274 0.658 0.664 0.667
Multiple choice 0.838 0.924 0.920 0.925
Report generation 0.395 0.419 0.441 0.439

Quickstart

Please refer to MedM-VL.

Citation

@inproceedings{shi2025medm,
  title={Medm-vl: What makes a good medical lvlm?},
  author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Fu, Xiangling and Li, Miao and Wu, Ji},
  booktitle={International Workshop on Agentic AI for Medicine},
  pages={290--299},
  year={2025},
  organization={Springer}
}
Downloads last month
12
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shiym2000/MedM-VL-CT-Chest-3B-en

Base model

Qwen/Qwen2.5-3B
Finetuned
(859)
this model

Collection including shiym2000/MedM-VL-CT-Chest-3B-en