zai-org
/

UI2Code_N

+---
+license: apache-2.0
+language:
+- zh
+- en
+base_model:
+- zai-org/GLM-4.1V-9B-Base
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+<h1>UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation</h1>
+- **Repository:** https://github.com/zai-org/UI2Code_N
+- **Paper:** https://arxiv.org/abs/25****
+<p align="center">
+  <img src="https://github.com/zheny2751-dotcom/UI2Code-N/blob/main/assets/fig1.png" alt="abs" style="width:60%;" />
+</p>
+**Glyph** is a framework for scaling the context length through visual-text compression.
+Instead of extending token-based context windows, Glyph renders long textual sequences into images and processes them using vision–language models (VLMs).
+This design transforms the challenge of long-context modeling into a multimodal problem, substantially reducing computational and memory costs while preserving semantic information.
+### Backbone Model
+Our model is built on [GLM-4.1V-9B-Base](https://huggingface.co/zai-org/GLM-4.1V-9B-Base).
+### Quick Inference
+This is a simple example of running single-image inference using the `transformers` library.
+First, install the `transformers` library:
+```
+pip install transformers>=4.57.1
+```
+Then, run the following code:
+```python
+from transformers import AutoProcessor, AutoModelForImageTextToText
+import torch
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "image",
+                "url": "https://raw.githubusercontent.com/thu-coai/Glyph/main/assets/Little_Red_Riding_Hood.png"
+            },
+            {
+                "type": "text",
+                "text": "Who pretended to be Little Red Riding Hood's grandmother"
+            }
+        ],
+    }
+]
+processor = AutoProcessor.from_pretrained("zai-org/Glyph")
+model = AutoModelForImageTextToText.from_pretrained(
+    pretrained_model_name_or_path="zai-org/Glyph",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+inputs = processor.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_dict=True,
+    return_tensors="pt"
+).to(model.device)
+generated_ids = model.generate(**inputs, max_new_tokens=8192)
+output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
+print(output_text)
+```
+See our [Github Repo](https://github.com/zai-org/UI2Code_N) for more detailed usage.
+## Citation
+If you find our model useful in your work, please cite it with:
+```
+@article{ui2coden2025,
+    title   = {UI2Code$^{N}$: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation},
+    author  = {Yang, Zhen and Hong, Wenyi and Xu, Mingde and Fan, Xinyue and Wang, Weihan and Gu, Xiaotao and Tang, Jie},
+    journal = {arXiv preprint arXiv:2501.XXXXX},
+    year    = {2025}
+}
+```