Z-Image-Turbo / CLAUDE.md
tchung1970's picture
Add Korean localization and CLAUDE.md documentation
47e50c0

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

Z-Image-Turbo is a Gradio-based Hugging Face Space for image generation using the Z-Image diffusion transformer model. It provides a web interface for text-to-image generation with optional prompt enhancement via API.

Running the Application

Start the Gradio app:

python app.py

The app will launch with MCP server support enabled and be accessible via the Gradio interface.

Environment Variables

Required environment variables (set these before running):

  • MODEL_PATH: Path or HF model ID (default: "Tongyi-MAI/Z-Image-Turbo")
  • HF_TOKEN: Hugging Face token for model access
  • DASHSCOPE_API_KEY: Optional, for prompt enhancement feature (currently disabled in UI)
  • ENABLE_COMPILE: Enable torch.compile optimizations (default: "true")
  • ENABLE_WARMUP: Warmup model on startup (default: "true")
  • ATTENTION_BACKEND: Attention implementation (default: "flash_3")

Architecture

Core Components

app.py - Main application file containing:

  • Model loading and initialization (load_models, init_app)
  • Image generation pipeline using ZImagePipeline from diffusers
  • Gradio UI with resolution presets and generation controls
  • Optional prompt enhancement via DashScope API (currently disabled in UI)
  • Zero GPU integration with AoTI (Ahead of Time Inductor) compilation

pe.py - Contains prompt_template for the prompt expander, a Chinese language system prompt that guides LLMs to transform user prompts into detailed visual descriptions suitable for image generation models.

Key Functions

generate(prompt, resolution, seed, steps, shift, enhance, random_seed, gallery_images, progress) (app.py:366)

  • Main generation function decorated with @spaces.GPU
  • Processes prompt, applies settings, generates image
  • Returns updated gallery, seed used
  • The enhance parameter is currently disabled in the UI but functional in code

load_models(model_path, enable_compile, attention_backend) (app.py:100)

  • Loads VAE, text encoder, tokenizer, and transformer
  • Applies torch.compile optimizations if enabled
  • Configures attention backend (native/flash_3)

warmup_model(pipe, resolutions) (app.py:205)

  • Pre-warms model for all resolution configurations
  • Reduces first-generation latency

Resolution System

The app supports two resolution categories (1024 and 1280) with multiple aspect ratios:

  • 1:1, 9:7, 7:9, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16, 21:9, 9:21
  • Resolutions are stored in RES_CHOICES dict and parsed via get_resolution()

Prompt Enhancement (Currently Disabled)

The PromptExpander and APIPromptExpander classes provide optional prompt enhancement via DashScope API:

  • Backend: OpenAI-compatible API at dashscope.aliyuncs.com
  • Model: qwen3-max-preview
  • System prompt from pe.prompt_template guides detailed visual description generation
  • UI controls are commented out but underlying code is functional

Dependencies

Install via:

pip install -r requirements.txt

Key dependencies:

  • gradio (UI framework)
  • torch, transformers, diffusers (ML models)
  • spaces (Hugging Face Spaces integration)
  • openai (for optional prompt enhancement)
  • Custom diffusers fork from GitHub with Z-Image support

Model Details

  • Architecture: Single-stream diffusion transformer (Z-Image)
  • Scheduler: FlowMatchEulerDiscreteScheduler with configurable shift parameter
  • Precision: bfloat16
  • Device: CUDA required
  • Attention: Configurable backend (native or flash_3)

Zero GPU Integration

The app uses Hugging Face Spaces Zero GPU features:

  • @spaces.GPU decorator on generate function
  • AoTI (Ahead of Time Inductor) compilation for transformer blocks (app.py:458-459)
  • Pre-compiled blocks loaded from "zerogpu-aoti/Z-Image" with flash_attention_3 variant