Instructions to use MeiGen-AI/GenEvolve with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MeiGen-AI/GenEvolve with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MeiGen-AI/GenEvolve")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("MeiGen-AI/GenEvolve")
model = AutoModelForImageTextToText.from_pretrained("MeiGen-AI/GenEvolve")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MeiGen-AI/GenEvolve with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MeiGen-AI/GenEvolve"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MeiGen-AI/GenEvolve",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MeiGen-AI/GenEvolve

SGLang

How to use MeiGen-AI/GenEvolve with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MeiGen-AI/GenEvolve" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MeiGen-AI/GenEvolve",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MeiGen-AI/GenEvolve" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MeiGen-AI/GenEvolve",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use MeiGen-AI/GenEvolve with Docker Model Runner:
```
docker model run hf.co/MeiGen-AI/GenEvolve
```

GenEvolve / README.md

Ephemeral182

Update README.md

b0f6799 verified 2 days ago

preview code

raw

history blame contribute delete

9.52 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: image-text-to-text
	base_model: Qwen/Qwen3-VL-8B-Instruct
	tags:
	- agent
	- image-generation
	- tool-use
	- visual-reasoning
	- self-distillation
	- grpo
	- reinforcement-learning
	- multimodal
	- qwen3-vl
	datasets:
	- MeiGen-AI/GenEvolve-Data-Bench
	---

	<div align="center">

	<img src="assets/logo_genevolve.png" alt="GenEvolve" width="160">

	<h1>GenEvolve</h1>

	<p><strong><em>Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation</em></strong></p>

	<p>
	<a href="https://arxiv.org/abs/2605.21605">
	<img alt="Paper" src="https://img.shields.io/badge/📄_Paper-arXiv:2605.21605-b31b1b"></a>
	<a href="https://ephemeral182.github.io/GenEvolve/">
	<img alt="Project Page" src="https://img.shields.io/badge/🌐_Project-Page-1f6feb"></a>
	<a href="https://github.com/MeiGen-AI/GenEvolve">
	<img alt="Code" src="https://img.shields.io/badge/💾_GitHub-Code-181717"></a>
	<a href="https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data-Bench">
	<img alt="Dataset" src="https://img.shields.io/badge/🤗_Dataset-GenEvolve--Data-FFD21E"></a>
	</p>

	</div>

	This repository hosts the GenEvolve agent policy — a Qwen3-VL-8B-Instruct backbone fine-tuned and self-evolved into a tool-orchestrated image-generation agent. Given a user request, the agent issues web/image searches, retrieves visual references, activates internal generation knowledge, and emits an executable prompt-reference program `z = (gen_prompt, reference_images)` that drives any reference-conditioned downstream generator (Qwen-Image-Edit, Nano Banana Pro, ...).

	<div align="center">
	<img src="assets/teaser.jpg" alt="GenEvolve teaser" width="100%">

	<p><em>The same trained agent policy paired with two reference-conditioned generators ⟶<br>
	<strong>Qwen-Image-Edit (open)</strong>  ·  <strong>Nano Banana Pro (strong)</strong></em></p>
	</div>

	---

	## ✨ Highlights

	- Tool-orchestrated trajectories. The agent calls `search`, `image_search`, and `query_knowledge` (8 callable generation skills) before producing a final program `z = (gen_prompt, reference_images)`.
	- Self-evolution with Visual Experience Distillation. Best-vs-worst trajectory pairs are distilled token-level into the deployed student. No runtime memory at inference.
	- Generator-transferable. The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).

	## 📊 Headline Results

	### GenEvolve-Bench (KScore, held-out split)

	\| Method \| Generator \| KScore \| Knowledge-Anch. \| Quality-Anch. \|
	\|---\|---\|---:\|---:\|---:\|
	\| Qwen-Image (raw) \| Qwen-Image \| 0.2987 \| 0.2384 \| 0.3768 \|
	\| Nano Banana Pro (raw) \| Nano Banana Pro \| 0.5298 \| 0.5160 \| 0.5477 \|
	\| Gen-Searcher 8B \| Qwen-Image-Edit-2511 \| 0.3493 \| 0.3293 \| 0.3745 \|
	\| Gen-Searcher 8B \| Nano Banana Pro \| 0.5481 \| 0.5472 \| 0.5492 \|
	\| GenEvolve (Ours) \| Qwen-Image-Edit-2511 \| 0.3663 \| 0.3410 \| 0.3990 \|
	\| GenEvolve (Ours) \| Nano Banana Pro \| 0.5739 \| 0.5669 \| 0.5830 \|

	### WISE Benchmark (WiScore, six knowledge categories)

	\| Model \| Cultural \| Time \| Space \| Biology \| Physics \| Chemistry \| Overall \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| GPT-4o \| 0.81 \| 0.71 \| 0.89 \| 0.83 \| 0.79 \| 0.74 \| 0.80 \|
	\| Gen-Searcher-8B + Qwen-Image \| 0.80 \| 0.71 \| 0.82 \| 0.76 \| 0.74 \| 0.75 \| 0.77 \|
	\| Mind-Brush \| 0.83 \| 0.69 \| 0.84 \| 0.71 \| 0.85 \| 0.68 \| 0.78 \|
	\| GenEvolve + Qwen-Image-Edit \| 0.84 \| 0.74 \| 0.87 \| 0.83 \| 0.81 \| 0.83 \| 0.82 \|

	---

	## 🧠 Method Overview

	<p align="center"><img src="assets/overview.png" alt="GenEvolve method overview" width="92%"></p>

	For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.

	---

	## 🖼️ Visual Demos

	<p align="center"><img src="assets/visual_comparison.png" alt="Qualitative comparison" width="100%"></p>

	<p align="center"><sub>Qualitative comparison on representative cases. <span style="color:#D97706">Orange</span> marks external/uncommon knowledge requirements; <span style="color:#2563EB">blue</span> marks internal generation-knowledge requirements.</sub></p>

	### 🎨 Gallery — paired with Nano Banana Pro

	<p align="center"><img src="assets/gallery_nano.jpg" alt="GenEvolve + Nano Banana Pro gallery" width="100%"></p>

	<p align="center"><sub>The same agent policy with Nano Banana Pro as the downstream renderer. Examples cover spatial layout, text rendering, quantity counting, attribute binding, anatomy/pose, creative transfer, material physics, and aesthetic drawing.</sub></p>

	### 🎨 Gallery — paired with Qwen-Image-Edit (open)

	<p align="center"><img src="assets/gallery_qwen.jpg" alt="GenEvolve + Qwen-Image-Edit gallery" width="100%"></p>

	<p align="center"><sub>Same trained policy paired with the open-source Qwen-Image-Edit-2511 renderer; consistent quality across both generators reflects generator-transferable orchestration.</sub></p>

	---

	## 🚀 Quick Start

	The deployed checkpoint is the student policy — it consumes a user prompt and returns a JSON `gen_prompt + reference_images` program through a `<think>/<tool_call>/<answer>` loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the [GitHub repo](https://github.com/MeiGen-AI/GenEvolve); the snippet below mirrors its installation and usage.

	### 1. Install the main GenEvolve runtime

	```bash
	git clone https://github.com/MeiGen-AI/GenEvolve.git
	cd GenEvolve

	conda create -n genevolve python=3.11 -y && conda activate genevolve
	pip install -U pip setuptools wheel packaging psutil ninja
	pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
	pip install --no-build-isolation -r requirements.txt
	pip install -e .
	```

	Qwen-Image-Edit rendering runs as a separate FastAPI service (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use `--backend qwen-image-edit-service`.

	### 2. Serve the agent policy

	```bash
	# Single GPU / single replica.
	MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh

	# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).
	MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh
	```

	`TP` shards one model replica across multiple GPUs; `DP` launches multiple replicas; total GPU usage is `TP × DP`.

	### 3. End-to-end example

	```bash
	export SERPER_API_KEY=<your_key> # required for search / image_search
	export GOOGLE_API_KEY=<your_key> # or GEMINI_API_KEY; only for --backend nano-banana-pro

	# Nano Banana Pro renderer
	python examples/quickstart.py \
	--backend nano-banana-pro \
	--base-url http://localhost:8000/v1 \
	--model GenEvolve \
	--prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
	--output paris.png

	# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)
	python examples/quickstart.py \
	--backend qwen-image-edit-service \
	--service-url http://your-qwen-service:8001 \
	--base-url http://localhost:8000/v1 \
	--model GenEvolve \
	--output paris_qwen.png
	```

	The agent's final `<answer>` is a JSON object:

	```json
	{
	"gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
	"reference_images": [
	{"img_id": "IMG_001", "note": "what to copy from this image"}
	]
	}
	```

	`gen_prompt` MUST refer to selected images using ordinal phrases (`"the first reference image"`) — never raw `IMG_###` ids or URLs. Pass `(gen_prompt, [r["local_path"] for r in reference_images])` to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.

	---

	## 🗂️ Related Artifacts

	\| Artifact \| Link \|
	\|---\|---\|
	\| Project page \| https://ephemeral182.github.io/GenEvolve/ \|
	\| Paper \| Coming soon \|
	\| Code \| https://github.com/MeiGen-AI/GenEvolve \|
	\| Training data + benchmark \| [MeiGen-AI/GenEvolve-Data-Bench](https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data-Bench) \|
	\| Base model \| [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) \|

	---

	## ⚖️ Intended Use, Limits, Bias

	- Intended use. Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
	- Search dependency. The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
	- Bias. Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.

	---

	## 📑 Citation

	```bibtex
	@misc{chen2026genevolveselfevolvingimagegeneration,
	title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation},
	author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
	year={2026},
	eprint={2605.21605},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2605.21605},
	}
	```