Z-Image-Turbo

Running on Zero

App Files Files Community

Z-Image-Turbo / CLAUDE.md

tchung1970

Add Korean localization and CLAUDE.md documentation

47e50c0 9 days ago

preview code

raw

history blame contribute delete

3.91 kB

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

Z-Image-Turbo is a Gradio-based Hugging Face Space for image generation using the Z-Image diffusion transformer model. It provides a web interface for text-to-image generation with optional prompt enhancement via API.

Running the Application

Start the Gradio app:

python app.py

The app will launch with MCP server support enabled and be accessible via the Gradio interface.

Environment Variables

Required environment variables (set these before running):

MODEL_PATH: Path or HF model ID (default: "Tongyi-MAI/Z-Image-Turbo")
HF_TOKEN: Hugging Face token for model access
DASHSCOPE_API_KEY: Optional, for prompt enhancement feature (currently disabled in UI)
ENABLE_COMPILE: Enable torch.compile optimizations (default: "true")
ENABLE_WARMUP: Warmup model on startup (default: "true")
ATTENTION_BACKEND: Attention implementation (default: "flash_3")

Architecture

Core Components

app.py - Main application file containing:

Model loading and initialization (load_models, init_app)
Image generation pipeline using ZImagePipeline from diffusers
Gradio UI with resolution presets and generation controls
Optional prompt enhancement via DashScope API (currently disabled in UI)
Zero GPU integration with AoTI (Ahead of Time Inductor) compilation

pe.py - Contains prompt_template for the prompt expander, a Chinese language system prompt that guides LLMs to transform user prompts into detailed visual descriptions suitable for image generation models.

Key Functions

generate(prompt, resolution, seed, steps, shift, enhance, random_seed, gallery_images, progress) (app.py:366)

Main generation function decorated with @spaces.GPU
Processes prompt, applies settings, generates image
Returns updated gallery, seed used
The enhance parameter is currently disabled in the UI but functional in code

load_models(model_path, enable_compile, attention_backend) (app.py:100)

Loads VAE, text encoder, tokenizer, and transformer
Applies torch.compile optimizations if enabled
Configures attention backend (native/flash_3)

warmup_model(pipe, resolutions) (app.py:205)

Pre-warms model for all resolution configurations
Reduces first-generation latency

Resolution System

The app supports two resolution categories (1024 and 1280) with multiple aspect ratios:

1:1, 9:7, 7:9, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16, 21:9, 9:21
Resolutions are stored in RES_CHOICES dict and parsed via get_resolution()

Prompt Enhancement (Currently Disabled)

The PromptExpander and APIPromptExpander classes provide optional prompt enhancement via DashScope API:

Backend: OpenAI-compatible API at dashscope.aliyuncs.com
Model: qwen3-max-preview
System prompt from pe.prompt_template guides detailed visual description generation
UI controls are commented out but underlying code is functional

Dependencies

Install via:

pip install -r requirements.txt

Key dependencies:

gradio (UI framework)
torch, transformers, diffusers (ML models)
spaces (Hugging Face Spaces integration)
openai (for optional prompt enhancement)
Custom diffusers fork from GitHub with Z-Image support

Model Details

Architecture: Single-stream diffusion transformer (Z-Image)
Scheduler: FlowMatchEulerDiscreteScheduler with configurable shift parameter
Precision: bfloat16
Device: CUDA required
Attention: Configurable backend (native or flash_3)

Zero GPU Integration

The app uses Hugging Face Spaces Zero GPU features:

@spaces.GPU decorator on generate function
AoTI (Ahead of Time Inductor) compilation for transformer blocks (app.py:458-459)
Pre-compiled blocks loaded from "zerogpu-aoti/Z-Image" with flash_attention_3 variant