MedCLIPSeg: Probabilistic VisionโLanguage Adaptation for Data-Efficient and Generalizable Medical Image Segmentation
This repository hosts the official trained model checkpoints for MedCLIPSeg, a visionโlanguage framework for medical image segmentation built on top of CLIP.
The released checkpoints correspond exactly to the experiments reported in our paper and are provided for evaluation and reproducibility purposes only.
๐ง Model Overview
- Backbone: UniMedCLIP ViT-B/16
- Task: Medical Image Segmentation
- Modalities: Ultrasound, MRI, Endoscopy, Dermoscopy, X-ray
- Training Regimes:
- Data Efficiency Evaluation
- Fully supervised learning
- Domain generalization
๐ Reproducing Paper Results
Step 1: Download the Checkpoints
Download the checkpoints here, then create a directory named outputs_medclipseg at the root of the project and place the downloaded checkpoint folders inside it so that the directory structure matches the following layout:
outputs_medclipseg/
โโโ BUSI/
โโโ BTMRI/
โโโ ISIC/
โโโ Kvasir/
โโโ Covid19/
โโโ EUS/
โโโ ...
Each folder contains the trained UniMedCLIP-based MedCLIPSeg checkpoints for that dataset.
Step 2: Run Evaluation
Run the following script to reproduce the results reported in the paper:
bash scripts/reproduce_eval.sh
This script automatically loads the corresponding checkpoints and evaluates them on the appropriate test sets.
๐ Outputs
Evaluation outputs (segmentations and uncertainty maps) are written to:
outputs_medclipseg/<DATASET>/seg_results/
outputs_medclipseg/<DATASET>/unc_results/
๐ Acknowledgment of Foundation Models
The underlying visionโlanguage models used in this repository were introduced in prior work. We gratefully acknowledge the original authors:
PubMedCLIP
Eslami, Sedigheh, Gerard De Melo, and Christoph Meinel. "Does clip benefit visual question answering in the medical domain as much as it does in the general domain?." arXiv preprint arXiv:2112.13906 (2021).UniMedCLIP
Khattak, Muhammad Uzair, et al. "Unimed-clip: Towards a unified image-text pretraining paradigm for diverse medical imaging modalities." arXiv preprint arXiv:2412.10372 (2024).
For completeness and reproducibility, this repository also includes the original pretrained checkpoints of these foundation models under the checkpoints/ directory, exactly as released by their respective authors.
All MedCLIPSeg checkpoints are adaptations built on top of these pretrained models and are released strictly for research and non-commercial use, in accordance with their respective licenses.
๐ Citation
If you use these checkpoints in your research, please cite:
@inproceedings{koleilat2026medclipseg,
title = {MedCLIPSeg: Probabilistic Vision--Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation},
author = {Koleilat, Taha and Asgariandehkordi, Hojat, Nejatimanzari, Omid and Barile, Berardino and Xiao, Yiming and Rivaz, Hassan},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
year = {2026}
}