maplebb
/

ControlThinker

Safetensors

Model card Files Files and versions

xet

Community

Improve model card: add metadata, detailed description, links, and sample usage

by nielsr HF Staff - opened Nov 25

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+51

-2

Files changed (1) hide show

README.md +51 -2

README.md CHANGED Viewed

@@ -1,8 +1,57 @@
 ## ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
-[![ControlThinker](https://img.shields.io/badge/Paper-ControlThinker-d32f2f.svg?logo=arXiv)](https://arxiv.org/abs/2506.03596) [![ControlThinker](https://img.shields.io/badge/%F0%9F%A4%97%20HF%20-ControlThinker-yellow)](https://huggingface.co/maplebb/ControlThinker)
-This depository contains checkpoints of ControlThinker
 ## ✍️ Citation

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-to-image
+---
 ## ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
+[![Paper (arXiv)](https://img.shields.io/badge/Paper-ControlThinker-d32f2f.svg?logo=arXiv)](https://arxiv.org/abs/2506.03596) [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20HF%20-Model-yellow)](https://huggingface.co/maplebb/ControlThinker) [![Hugging Face Paper](https://img.shields.io/badge/Paper-HF-blue)](https://huggingface.co/papers/2506.03596) [GitHub Repository](https://github.com/maplebb/controlthinker)
+ControlThinker is a novel framework that employs a "comprehend-then-generate" paradigm for controllable image generation through visual reasoning. It addresses the semantic gap between input text prompts and target images by leveraging a Multimodal Large Language Model (MLLM) to extract latent semantics from control images. This enriches prompts, significantly enhancing visual quality and semantic consistency in generated images.
+The model was presented in the paper [ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning](https://huggingface.co/papers/2506.03596).
+<p align="center"><img src="https://github.com/maplebb/controlthinker/raw/main/asset/image/teaser.png" width="95%"></p>
+## Usage
+You can use ControlThinker for image generation. Below is a sample usage demonstrating how to generate an image from a text prompt.
+```python
+from inference_solver import FlexARInferenceSolver
+from PIL import Image
+# ******************** Image Generation ********************
+inference_solver = FlexARInferenceSolver(
+    model_path="maplebb/ControlThinker",
+    precision="bf16",
+    target_size=768,
+)
+q1 = f"Generate an image of 768x768 according to the following prompt:
+" \
+     f"Image of a dog playing water, and a waterfall is in the background."
+# generated: tuple of (generated response, list of generated images)
+generated = inference_solver.generate(
+    images=[],
+    qas=[[q1, None]],
+    max_gen_len=8192,
+    temperature=1.0,
+    logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000),
+)
+a1, new_image = generated[0], generated[1][0]
+# You can save and display the generated image
+new_image.save("generated_dog.png")
+new_image.show()
+```
+## License
+ControlThinker is licensed under the Apache 2.0.
 ## ✍️ Citation