| --- |
| license: apache-2.0 |
| language: |
| - en |
| library_name: diffusers |
| pipeline_tag: image-to-image |
| tags: |
| - Image-to-Image |
| - ControlNet |
| - Diffusers |
| - QwenImageControlNetPipeline |
| - Qwen-Image |
| base_model: Qwen/Qwen-Image |
| --- |
| |
| # Qwen-Image-ControlNet-Union |
| This repository provides a unified ControlNet that supports 4 common control types (canny, soft edge, depth, pose) for [Qwen-Image](https://github.com/QwenLM/Qwen-Image). |
|
|
|
|
| # Model Cards |
| - This ControlNet consists of 5 double blocks copied from the pretrained transformer layers. |
| - We train the model from scratch for 50K steps using a dataset of 10M high-quality general and human images. |
| - We train at 1328x1328 resolution in BFloat16, batch size=64, learning rate=4e-5. We set the text drop ratio to 0.10. |
| - This model supports multiple control modes, including canny, soft edge, depth, pose. You can use it just as a normal ControlNet. |
|
|
| # Showcases |
| <table style="width:100%; table-layout:fixed;"> |
| <tr> |
| <td><img src="./conds/canny1.png" alt="canny"></td> |
| <td><img src="./outputs/canny1.png" alt="canny"></td> |
| </tr> |
| <tr> |
| <td><img src="./conds/soft_edge.png" alt="soft_edge"></td> |
| <td><img src="./outputs/soft_edge.png" alt="soft_edge"></td> |
| </tr> |
| <tr> |
| <td><img src="./conds/depth.png" alt="depth"></td> |
| <td><img src="./outputs/depth.png" alt="depth"></td> |
| </tr> |
| <tr> |
| <td><img src="./conds/pose.png" alt="pose"></td> |
| <td><img src="./outputs/pose.png" alt="pose"></td> |
| </tr> |
| </table> |
| |
| # Inference |
| ```python |
| import torch |
| from diffusers.utils import load_image |
| |
| # https://github.com/huggingface/diffusers/pull/12215 |
| # pip install git+https://github.com/huggingface/diffusers |
| from diffusers import QwenImageControlNetPipeline, QwenImageControlNetModel |
| |
| base_model = "Qwen/Qwen-Image" |
| controlnet_model = "InstantX/Qwen-Image-ControlNet-Union" |
| |
| controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16) |
| |
| pipe = QwenImageControlNetPipeline.from_pretrained( |
| base_model, controlnet=controlnet, torch_dtype=torch.bfloat16 |
| ) |
| pipe.to("cuda") |
| |
| # canny |
| # it is highly suggested to add 'TEXT' into prompt if there are text elements |
| control_image = load_image("conds/canny.png") |
| prompt = "Aesthetics art, traditional asian pagoda, elaborate golden accents, sky blue and white color palette, swirling cloud pattern, digital illustration, east asian architecture, ornamental rooftop, intricate detailing on building, cultural representation." |
| controlnet_conditioning_scale = 1.0 |
| |
| # soft edge |
| # control_image = load_image("conds/soft_edge.png") |
| # prompt = "Photograph of a young man with light brown hair jumping mid-air off a large, reddish-brown rock. He's wearing a navy blue sweater, light blue shirt, gray pants, and brown shoes. His arms are outstretched, and he has a slight smile on his face. The background features a cloudy sky and a distant, leafless tree line. The grass around the rock is patchy." |
| # controlnet_conditioning_scale = 1.0 |
| |
| # depth |
| # control_image = load_image("conds/depth.png") |
| # prompt = "A swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. A beige couch with white cushions sits on a wooden floor, with a matching coffee table in front. The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. Sunlight pours through the leaves outside, casting cool shadows on the floor." |
| # controlnet_conditioning_scale = 1.0 |
| |
| # pose |
| # control_image = load_image("conds/pose.png") |
| # prompt = "Photograph of a young man with light brown hair and a beard, wearing a beige flat cap, black leather jacket, gray shirt, brown pants, and white sneakers. He's sitting on a concrete ledge in front of a large circular window, with a cityscape reflected in the glass. The wall is cream-colored, and the sky is clear blue. His shadow is cast on the wall." |
| # controlnet_conditioning_scale = 1.0 |
| |
| image = pipe( |
| prompt=prompt, |
| negative_prompt=" ", |
| control_image=control_image, |
| controlnet_conditioning_scale=controlnet_conditioning_scale, |
| width=control_image.size[0], |
| height=control_image.size[1], |
| num_inference_steps=30, |
| true_cfg_scale=4.0, |
| generator=torch.Generator(device="cuda").manual_seed(42), |
| ).images[0] |
| image.save(f"qwenimage_cn_union_result.png") |
| ``` |
|
|
| # Inference Setting |
| You can adjust control strength via controlnet_conditioning_scale. |
| - Canny: use cv2.Canny, set controlnet_conditioning_scale in [0.8, 1.0] |
| - Soft Edge: use [AnylineDetector](https://github.com/huggingface/controlnet_aux), set controlnet_conditioning_scale in [0.8, 1.0] |
| - Depth: use [depth-anything](https://github.com/DepthAnything/Depth-Anything-V2), set controlnet_conditioning_scale in [0.8, 1.0] |
| - Pose: use [DWPose](https://github.com/IDEA-Research/DWPose/tree/onnx), set controlnet_conditioning_scale in [0.8, 1.0] |
|
|
| We strongly recommend using detailed prompts, especially when include text elements. For example, use "a poster with text 'InstantX Team' on the top" instead of "a poster". |
|
|
| For multiple conditions inference, please refer to [PR](https://github.com/huggingface/diffusers/pull/12215). |
|
|
| # ComfyUI Support |
| [ComfyUI](https://www.comfy.org/) offers native support for Qwen-Image-ControlNet-Union. Check the [blog](https://blog.comfy.org/p/day-1-support-of-qwen-image-instantx) for more details. |
|
|
| # Community Support |
| [Liblib AI](https://www.liblib.art/) offers native support for Qwen-Image-ControlNet-Union. [Visit](https://www.liblib.art/sd) for online inference. |
|
|
| # Limitations |
| We find that the model was unable to preserve some details without explicit 'TEXT' in prompt, such as small font text. |
|
|
| # Acknowledgements |
| This model is developed by InstantX Team. All copyright reserved. |
|
|