Instructions to use kekchpek/idlm-mdlm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kekchpek/idlm-mdlm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kekchpek/idlm-mdlm", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("kekchpek/idlm-mdlm", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use kekchpek/idlm-mdlm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kekchpek/idlm-mdlm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kekchpek/idlm-mdlm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/kekchpek/idlm-mdlm
- SGLang
How to use kekchpek/idlm-mdlm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kekchpek/idlm-mdlm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kekchpek/idlm-mdlm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kekchpek/idlm-mdlm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kekchpek/idlm-mdlm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use kekchpek/idlm-mdlm with Docker Model Runner:
docker model run hf.co/kekchpek/idlm-mdlm
IDLM-MDLM
IDLM-MDLM is an Inverse-distilled Diffusion Language Model distilled from the pretrained MDLM OpenWebText checkpoint. It is released with the paper IDLM: Inverse-distilled Diffusion Language Models.
Diffusion Language Models can produce high-quality text, but standard reverse diffusion requires many sampling steps. IDLM trains a few-step student generator from a pretrained DLM teacher using an inverse distillation objective with an auxiliary fake model. This checkpoint targets fast generation from an absorbing-state masked diffusion teacher.
- Project page: https://david-cripto.github.io/idlm-project-page/
- Code: https://github.com/David-cripto/IDLM
- Paper: https://arxiv.org/abs/2602.19066
Model Details
- Model family: IDLM, discrete diffusion language model
- Teacher checkpoint:
kuleshov-group/mdlm-owt - Diffusion type: absorbing-state / masked diffusion
- Training data: OpenWebText
- Tokenizer: GPT-2 tokenizer
- Context length: 1024 tokens
- Parameters: 169,627,250
- Tensor type: F32 Safetensors
- Architecture config: 12 blocks, 12 heads, hidden size 768, conditioning dimension 128, dropout 0.1
- License: MIT
Intended Use
This checkpoint is intended for research on discrete diffusion language models, few-step diffusion sampling, and reproduction of the IDLM paper experiments.
Installation
The sampling code depends on CUDA and FlashAttention.
git clone https://github.com/David-cripto/IDLM.git
cd IDLM
conda create -n idlm python=3.12
conda activate idlm
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1
Loading the Checkpoint
The Hugging Face repository contains custom model code. Use trust_remote_code=True.
from transformers import AutoModelForMaskedLM, AutoTokenizer
model_id = "kekchpek/idlm-mdlm"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(
model_id,
trust_remote_code=True,
)
Direct AutoModelForMaskedLM loading exposes the denoising network. For text generation, use the sampler in the official IDLM repository.
Generate Samples
mkdir -p samples
python -m main \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=mdlm \
algo.backbone=hf_dit \
eval.checkpoint_path=kekchpek/idlm-mdlm \
sampling.steps=16 \
sampling.num_sample_batches=10 \
sampling.predictor=ancestral_cache \
sampling.noise_removal=ancestral \
+wandb.offline=true \
eval.generated_samples_path=samples/idlm_mdlm_16steps.json
The generation script can be swept with different sampling steps.
Evaluation
The paper reports generation perplexity (GenPPL, lower is better) and sample entropy (higher is better) on OpenWebText-style generation. The released evaluation code defaults to gpt2-large for GenPPL.
| Sampling steps | GenPPL (lower is better) | Entropy (higher is better) |
|---|---|---|
| 32 | 20.37 | 5.23 |
| 16 | 32.74 | 5.42 |
| 8 | 79.42 | 5.61 |
| 4 | 310.38 | 5.78 |
For comparison, the MDLM teacher is reported at 1024 steps with GenPPL 41.29 and entropy 5.28.
Training Summary
IDLM-MDLM was trained by initializing the student and fake model from the pretrained MDLM teacher and alternating between:
- Updating the fake model on student-generated samples using the teacher diffusion loss.
- Updating the student using the teacher-fake loss gap.
This follows the inverse distillation objective described in the paper and uses the absorbing-state masked diffusion formulation.
Citation
@article{li2026idlm,
title={IDLM: Inverse-distilled Diffusion Language Models},
author={Li, David and Gushchin, Nikita and Abulkhanov, Dmitry and Moulines, Eric and Oseledets, Ivan and Panov, Maxim and Korotin, Alexander},
journal={arXiv preprint arXiv:2602.19066},
year={2026}
}
- Downloads last month
- 135
Model tree for kekchpek/idlm-mdlm
Base model
kuleshov-group/mdlm-owt