YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
D-VST:
Diffusion Transformer for Pathology-Correct Tone-Controllable Cross-Dye Virtual Staining of Whole Slide Images
π€ Model
| Task | Model |
|---|---|
| HE2IHC | π€HE2IHC |
| FFPE2HE | π€FFPE2HE |
| HE2mIHC | π€HE2mIHC |
Overview
Diffusion-based virtual staining methods of histopathology images have demon strated outstanding potential for stain normalization and cross-dye staining (e.g., hematoxylin-eosin to immunohistochemistry). However, achieving pathology correct cross-dye virtual staining with versatile tone controls poses significant challenges due to the difficulty of decoupling the given pathology and tone con ditions. This issue would cause non-pathologic regions to be mistakenly stained like pathologic ones, and vice versa, which we term βpathology leakage.β To address this issue, we propose diffusion virtual staining Transformer (D-VST), a new framework with versatile tone control for cross-dye virtual staining.
Specifically, we introduce a pathology encoder in conjunction with a tone encoder, combined with a two-stage curriculum learning scheme that decouples pathology and tone conditions, to enable tone control while eliminating pathology leakage.
Further, to extend our method for billion-pixel whole slide image (WSI) staining, we introduce a novel frequency-aware adaptive patch sampling strategy for high-quality yet effi cient inference of ultra-high resolution images in a zero-shot manner.
Integrating these two innovative components facilitates a pathology-correct, tone-controllable, cross-dye WSI virtual staining process. Extensive experiments on three virtual staining tasks that involve translating between four different dyes demonstrate the superiority of our approach in generating high-quality and pathologically accurate images compared to existing methods based on generative adversarial networks and diffusion models.
Preparation
Environments
git clone git@github.com:yangshurong/D-VST.git
cd D-VST
conda create -n D_VST python=3.10 -y
conda activate D_VST
pip install -r requirement.txt
Downloading Pretrain Weights
mkdir ./weights
HF_ENDPOINT="https://hf-mirror.com" huggingface-cli download \
--local-dir-use-symlinks False \
--resume-download PixArt-alpha/PixArt-XL-2-512x512 \
--local-dir ./weights/dvst_pretrained
HF_ENDPOINT="https://hf-mirror.com" huggingface-cli download \
--local-dir-use-symlinks False \
--resume-download lambdalabs/sd-image-variations-diffusers \
--include "feature_extractor/**" --include "image_encoder/**" \
--local-dir ./weights/dvst_pretrained
HF_ENDPOINT="https://hf-mirror.com" huggingface-cli download \
--local-dir-use-symlinks False \
--resume-download yangshurong/D-VST \
--local-dir ./weights/dvst_pretrained
Inference
Considering the large number of patches in inference, we support multi-GPU inference for acceleration. You can also use a single GPU for inference.
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --config_file ./configs/accelerate_deepspeed.yaml \
--main_process_port 29510 \
--num_processes 2 \
eval.py \
--config configs/eval/infer2_HE2IHC.yaml
By modifying "checkpoint_path" in "configs/eval/infer2_HE2IHC.yaml" to other types of model weight paths, you can infer different tasks (such as FFPE2HE or HE2mIHC).
Training
We also support multi-GPU training.
1. Preparing Your Data
To train the model, you need to organize paired sources (e.g. H&E) and target (e.g. IHC) image patches in a consistent directory structure. Each whole-slide image (WSI) should have a corresponding folder under both modalities. The following example uses HE2IHC.
data/HE2IHC
βββ HE
β βββ WSI1
β β βββ patch1.png
β β βββ patch2.png
β βββ WSI2
β β βββ patch1.png
β β βββ patch2.png
β βββ WSI3
β βββ patch1.png
β βββ patch2.png
βββ IHC
βββ WSI1
β βββ patch1.png
β βββ patch2.png
βββ WSI2
β βββ patch1.png
β βββ patch2.png
βββ WSI3
βββ patch1.png
βββ patch2.png
Data Requirements:
One-to-one correspondence: Each patch in HE/WSIx/ must have a corresponding patch with the same filename in IHC/WSIx/.
Image resolution: Image patches can be of any resolution, but excessively large images are not recommended due to increased memory usage and slower training. For most use cases, patch sizes in the range of 1024Γ1024 to 2048Γ2048 pixels provide a good balance between detail and efficiency.
For virtual staining between other types of data, you can modify the training_data in the training YAML file, specifying "pose_dir" (sources) and "target_dir" (target) with the corresponding data types. For example, in the FFPE2HE task, "pose_dir" is FFPE and "target_dir" is HE.
2. Training Stage 1
CUDA_VISIBLE_DEVICES=0 accelerate launch --config_file ./configs/accelerate_deepspeed.yaml \
--main_process_port 29510 \
--num_processes 1 \
train.py \
--config configs/training/train1_HE2IHC.yaml
You will see a visual representation of the inference results in ./TrainResult, and the saved model weights in ./Checkpoint.
3. Training Stage 2
Before training begins, you need to replace "checkpoint_path" in "configs/training/train2_HE2IHC.yaml" with the model weights path obtained from stage 1 training.
CUDA_VISIBLE_DEVICES=0 accelerate launch --config_file ./configs/accelerate_deepspeed.yaml \
--main_process_port 29510 \
--num_processes 1 \
train.py \
--config configs/training/train2_HE2IHC.yaml
Finally, you can use the model weights obtained in stage 2 for inference. This can be done by modifying the "checkpoint_path" in "configs/eval/infer2_HE2IHC.yaml" to the model weights path obtained during stage 2 training.
NOTE
For users with low GPU VRAM, you can use multiple GPUs and leverage Deepseed's zero2/3 capability to reduce VRAM consumption. We have already set the default setting to use zero2 in configs/accelerate_deepspeed.yaml. You can also enable gradient checkpointing by setting "gradient_checkpointing" to true in the training YAML file, although this will slow down training.
BibTeX
We would be honored if our work could be of assistance to you. Your citations and stars of this repository are greatly appreciated.
@inproceedings{D-VST,
title={D-VST: Diffusion Transformer for Pathology-Correct Tone-Controllable Cross-Dye Virtual Staining of Whole Slide Images},
year={2025},
author={Shurong, Yang and Wei, Dong and Hu, Yihuang and Peng, Qiong and Liu, Hong and Huang, Yawen and Wu, Xian and Zheng, Yefeng and Wang, Liansheng},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
url={https://openreview.net/pdf?id=jl0O0MYLyh},
}
