Improve model card: add metadata, detailed description, links, and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +51 -2
README.md CHANGED
@@ -1,8 +1,57 @@
 
 
 
 
 
 
1
  ## ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
2
 
3
- [![ControlThinker](https://img.shields.io/badge/Paper-ControlThinker-d32f2f.svg?logo=arXiv)](https://arxiv.org/abs/2506.03596) [![ControlThinker](https://img.shields.io/badge/%F0%9F%A4%97%20HF%20-ControlThinker-yellow)](https://huggingface.co/maplebb/ControlThinker)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
- This depository contains checkpoints of ControlThinker
6
 
7
  ## ✍️ Citation
8
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-to-image
5
+ ---
6
+
7
  ## ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
8
 
9
+ [![Paper (arXiv)](https://img.shields.io/badge/Paper-ControlThinker-d32f2f.svg?logo=arXiv)](https://arxiv.org/abs/2506.03596) [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20HF%20-Model-yellow)](https://huggingface.co/maplebb/ControlThinker) [![Hugging Face Paper](https://img.shields.io/badge/Paper-HF-blue)](https://huggingface.co/papers/2506.03596) [GitHub Repository](https://github.com/maplebb/controlthinker)
10
+
11
+ ControlThinker is a novel framework that employs a "comprehend-then-generate" paradigm for controllable image generation through visual reasoning. It addresses the semantic gap between input text prompts and target images by leveraging a Multimodal Large Language Model (MLLM) to extract latent semantics from control images. This enriches prompts, significantly enhancing visual quality and semantic consistency in generated images.
12
+
13
+ The model was presented in the paper [ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning](https://huggingface.co/papers/2506.03596).
14
+
15
+ <p align="center"><img src="https://github.com/maplebb/controlthinker/raw/main/asset/image/teaser.png" width="95%"></p>
16
+
17
+ ## Usage
18
+
19
+ You can use ControlThinker for image generation. Below is a sample usage demonstrating how to generate an image from a text prompt.
20
+
21
+ ```python
22
+ from inference_solver import FlexARInferenceSolver
23
+ from PIL import Image
24
+
25
+ # ******************** Image Generation ********************
26
+ inference_solver = FlexARInferenceSolver(
27
+ model_path="maplebb/ControlThinker",
28
+ precision="bf16",
29
+ target_size=768,
30
+ )
31
+
32
+ q1 = f"Generate an image of 768x768 according to the following prompt:
33
+ " \
34
+ f"Image of a dog playing water, and a waterfall is in the background."
35
+
36
+ # generated: tuple of (generated response, list of generated images)
37
+ generated = inference_solver.generate(
38
+ images=[],
39
+ qas=[[q1, None]],
40
+ max_gen_len=8192,
41
+ temperature=1.0,
42
+ logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000),
43
+ )
44
+
45
+ a1, new_image = generated[0], generated[1][0]
46
+
47
+ # You can save and display the generated image
48
+ new_image.save("generated_dog.png")
49
+ new_image.show()
50
+ ```
51
+
52
+ ## License
53
 
54
+ ControlThinker is licensed under the Apache 2.0.
55
 
56
  ## ✍️ Citation
57