ZhenYang21 commited on
Commit
fff425c
·
verified ·
1 Parent(s): f1ac63e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -3
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ base_model:
7
+ - zai-org/GLM-4.1V-9B-Base
8
+ pipeline_tag: image-text-to-text
9
+ library_name: transformers
10
+ ---
11
+
12
+
13
+ <h1>UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation</h1>
14
+
15
+ - **Repository:** https://github.com/zai-org/UI2Code_N
16
+ - **Paper:** https://arxiv.org/abs/25****
17
+
18
+
19
+ <p align="center">
20
+ <img src="https://github.com/zheny2751-dotcom/UI2Code-N/blob/main/assets/fig1.png" alt="abs" style="width:60%;" />
21
+ </p>
22
+
23
+ **Glyph** is a framework for scaling the context length through visual-text compression.
24
+ Instead of extending token-based context windows, Glyph renders long textual sequences into images and processes them using vision–language models (VLMs).
25
+ This design transforms the challenge of long-context modeling into a multimodal problem, substantially reducing computational and memory costs while preserving semantic information.
26
+
27
+
28
+ ### Backbone Model
29
+
30
+ Our model is built on [GLM-4.1V-9B-Base](https://huggingface.co/zai-org/GLM-4.1V-9B-Base).
31
+
32
+
33
+ ### Quick Inference
34
+
35
+ This is a simple example of running single-image inference using the `transformers` library.
36
+ First, install the `transformers` library:
37
+
38
+ ```
39
+ pip install transformers>=4.57.1
40
+ ```
41
+
42
+ Then, run the following code:
43
+
44
+ ```python
45
+ from transformers import AutoProcessor, AutoModelForImageTextToText
46
+ import torch
47
+
48
+ messages = [
49
+ {
50
+ "role": "user",
51
+ "content": [
52
+ {
53
+ "type": "image",
54
+ "url": "https://raw.githubusercontent.com/thu-coai/Glyph/main/assets/Little_Red_Riding_Hood.png"
55
+ },
56
+ {
57
+ "type": "text",
58
+ "text": "Who pretended to be Little Red Riding Hood's grandmother"
59
+ }
60
+ ],
61
+ }
62
+ ]
63
+ processor = AutoProcessor.from_pretrained("zai-org/Glyph")
64
+ model = AutoModelForImageTextToText.from_pretrained(
65
+ pretrained_model_name_or_path="zai-org/Glyph",
66
+ torch_dtype=torch.bfloat16,
67
+ device_map="auto",
68
+ )
69
+ inputs = processor.apply_chat_template(
70
+ messages,
71
+ tokenize=True,
72
+ add_generation_prompt=True,
73
+ return_dict=True,
74
+ return_tensors="pt"
75
+ ).to(model.device)
76
+ generated_ids = model.generate(**inputs, max_new_tokens=8192)
77
+ output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
78
+ print(output_text)
79
+ ```
80
+
81
+ See our [Github Repo](https://github.com/zai-org/UI2Code_N) for more detailed usage.
82
+
83
+
84
+
85
+
86
+ ## Citation
87
+ If you find our model useful in your work, please cite it with:
88
+ ```
89
+ @article{ui2coden2025,
90
+ title = {UI2Code$^{N}$: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation},
91
+ author = {Yang, Zhen and Hong, Wenyi and Xu, Mingde and Fan, Xinyue and Wang, Weihan and Gu, Xiaotao and Tang, Jie},
92
+ journal = {arXiv preprint arXiv:2501.XXXXX},
93
+ year = {2025}
94
+ }
95
+ ```
96
+