Text-to-3D
English
Turkish
Italian
medical

Text-to-CT Model Weights

Checkpoints for “Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining” (Molino et al., 2025).

Model Card for Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining

Model Description

  • Authors: Daniele Molino, Camillo Maria Caruso, Filippo Ruffini, Paolo Soda, Valerio Guarrasi
  • Model type: 3D latent diffusion (RFlow) + 3D VAE + CLIP3D text encoder for CT generation.
  • License: Apache 2.0 (same as code release).
  • Sources: Code https://github.com/cosbidev/Text2CT | Paper https://arxiv.org/abs/2506.00633
  • Demo: Use diff_model_demo.py from the code release for a one-off generation from text.

Intended Use

  • Direct use: Research/experimentation on text-conditioned 3D CT synthesis; generating synthetic data for benchmarking or augmentation.
  • Downstream use: Fine-tuning or integration into broader research pipelines.
  • Out of scope: Clinical decision-making, diagnostic use, or deployment without proper validation and approvals.

Risks & Limitations

  • Trained on CT-RATE; may encode dataset biases and is not validated for clinical use.
  • Synthetic outputs may contain artifacts; do not use for patient care.

Files included

  • autoencoder_epoch273.pt — 3D VAE for latent compression/decoding.
  • unet_rflow_200ep.pt — Diffusion UNet trained with rectified flow.
  • CLIP3D_Finding_Impression_30ep.pt — CLIP3D weights for encoding reports.

How to Get Started (Python)

from huggingface_hub import hf_hub_download
repo_id = "yourname/text2ct-weights"  # replace with the actual repo id

autoencoder_path = hf_hub_download(repo_id, "autoencoder_epoch273.pt")
unet_path = hf_hub_download(repo_id, "unet_rflow_200ep.pt")
clip_path = hf_hub_download(repo_id, "CLIP3D_Finding_Impression_30ep.pt")

# Use these in the code release configs:
# trained_autoencoder_path -> autoencoder_path
# existing_ckpt_filepath / model_filename -> unet_path
# clip_weights (for report embeddings) -> clip_path

Training Data (for these weights)

  • CT-RATE dataset (public on Hugging Face) for CT volumes and reports.

Training Procedure (summary)

Evaluation

  • See paper for quantitative and qualitative results.

Further Information

Environmental Impact

  • Not reported. Training used multi-GPU setup;.

Citation

If you use these weights or code, please cite the paper:

@article{molino2025text,
  title={Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining},
  author={Molino, Daniele and Caruso, Camillo Maria and Ruffini, Filippo and Soda, Paolo and Guarrasi, Valerio},
  journal={arXiv preprint arXiv:2506.00633},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train dmolino/text2ct-weights