Instructions to use joyfox/Qwen-Image-Edit-MeiTu with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use joyfox/Qwen-Image-Edit-MeiTu with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("joyfox/Qwen-Image-Edit-MeiTu", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
🌈 Qwen-Image-Edit-MeiTu
This model — Qwen-Image-Edit-MeiTu — is an improved variant of Qwen/Qwen-Image-Edit, built with DiT-based architecture fine-tuning to enhance visual consistency, aesthetic quality, and structural alignment in complex edits.
Developed by Valiant Cat AI Lab, this version aims to further close the gap between high-fidelity semantic editing and coherent artistic rendering, achieving a more natural and professional output across a wide range of prompts and subjects.
✨ Key Improvements
Enhanced Consistency:
Utilizes DiT (Diffusion Transformer) fine-tuning to ensure structural stability between input and edited regions, maintaining global spatial coherence.Aesthetic Optimization:
Trained with aesthetic discriminators and curated aesthetic score datasets, producing more pleasing colors, contrast, and light balance.Better Detail Preservation:
Improved low-level reconstruction for fine details such as textures, faces, and typography.Broader Scene Adaptability:
Performs well on portraits, environments, product photos, and illustrations, supporting both semantic and appearance-based editing.
🖼️ Showcase
Below are examples of consistency and aesthetic improvement in complex editing scenarios:
| Input & Output |
|---|
![]() |
![]() |
![]() |
![]() |
![]() |
💬 Recommended Prompts
Try these prompts to explore the model’s strengths:
- “make the lighting soft and cinematic with better balance”
- “enhance the photo’s composition and maintain realism”
- “refine skin tone and texture consistency”
- “improve the global color tone and aesthetic harmony”
- “increase photo realism and clarity without changing content”
🧩 Integration with ComfyUI
This model works seamlessly with a modified ComfyUI Qwen-Image-Edit workflow.
Just use this model in the Unet node to workflow for edit image.
📥 Download Model
Weights available in Safetensors format:
👉 Download Qwen-Image-Edit-MeiTu
🧠 Training
This model was trained and optimized by the
AI Laboratory of Chongqing Valiant Cat Technology Co., LTD.
Visit https://vvicat.com/ for business collaborations or research partnerships.
📄 Related Paper
This model is part of the Qwen-Edit+ research line and is associated with the following preprint:
Fan Tang, Siyuan Li
Qwen-Edit+: Scaling Image Editing with VLM-Guided Consistency and Aesthetic Preference Distillation.
Research Square, Version 1, 08 April 2026.
DOI: 10.21203/rs.3.rs-9352857/v1
📚 Citation
If you use this model, please cite:
@article{tang2026qweneditplus,
author = {Fan Tang and Siyuan Li},
title = {Qwen-Edit+: Scaling Image Editing with VLM-Guided Consistency and Aesthetic Preference Distillation},
journal = {Research Square},
year = {2026},
doi = {10.21203/rs.3.rs-9352857/v1},
url = {https://doi.org/10.21203/rs.3.rs-9352857/v1}
}
📜 License
Licensed under Apache 2.0.
💼 Join Us
We are hiring research engineers and creative ML practitioners at
Chongqing Valiant Cat Technology Co., LTD — reach out via
📧 tommy@vvicat.com
- Downloads last month
- 525
4-bit
5-bit
6-bit
8-bit




