MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation

This repository hosts the official trained model checkpoints for MedCLIPSeg, a vision–language framework for medical image segmentation built on top of CLIP.

The released checkpoints correspond exactly to the experiments reported in our paper and are provided for evaluation and reproducibility purposes only.

🧠 Model Overview

Backbone: UniMedCLIP ViT-B/16
Task: Medical Image Segmentation
Modalities: Ultrasound, MRI, Endoscopy, Dermoscopy, X-ray
Training Regimes:
- Data Efficiency Evaluation
- Fully supervised learning
- Domain generalization

🚀 Reproducing Paper Results

Step 1: Download the Checkpoints

Download the checkpoints here, then create a directory named outputs_medclipseg at the root of the project and place the downloaded checkpoint folders inside it so that the directory structure matches the following layout:

outputs_medclipseg/
├── BUSI/
├── BTMRI/
├── ISIC/
├── Kvasir/
├── Covid19/
├── EUS/
└── ...

Each folder contains the trained UniMedCLIP-based MedCLIPSeg checkpoints for that dataset.

Step 2: Run Evaluation

Run the following script to reproduce the results reported in the paper:

bash scripts/reproduce_eval.sh

This script automatically loads the corresponding checkpoints and evaluates them on the appropriate test sets.

📊 Outputs

Evaluation outputs (segmentations and uncertainty maps) are written to:

outputs_medclipseg/<DATASET>/seg_results/
outputs_medclipseg/<DATASET>/unc_results/

📚 Acknowledgment of Foundation Models

The underlying vision–language models used in this repository were introduced in prior work. We gratefully acknowledge the original authors:

PubMedCLIP
Eslami, Sedigheh, Gerard De Melo, and Christoph Meinel. "Does clip benefit visual question answering in the medical domain as much as it does in the general domain?." arXiv preprint arXiv:2112.13906 (2021).
UniMedCLIP
Khattak, Muhammad Uzair, et al. "Unimed-clip: Towards a unified image-text pretraining paradigm for diverse medical imaging modalities." arXiv preprint arXiv:2412.10372 (2024).

For completeness and reproducibility, this repository also includes the original pretrained checkpoints of these foundation models under the checkpoints/ directory, exactly as released by their respective authors.

All MedCLIPSeg checkpoints are adaptations built on top of these pretrained models and are released strictly for research and non-commercial use, in accordance with their respective licenses.

📖 Citation

If you use these checkpoints in your research, please cite:

@inproceedings{koleilat2026medclipseg,
  title     = {MedCLIPSeg: Probabilistic Vision--Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation},
  author    = {Koleilat, Taha and Asgariandehkordi, Hojat, Nejatimanzari, Omid and Barile, Berardino and Xiao, Yiming and Rivaz, Hassan},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
  year      = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Papers for TahaKoleilat/MedCLIPSeg

UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities

Paper • 2412.10372 • Published Dec 13, 2024 • 3

Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?

Paper • 2112.13906 • Published Dec 27, 2021