philippguevorguian commited on
Commit
418fbaf
·
verified ·
1 Parent(s): 41796d1

update README

Browse files
Files changed (1) hide show
  1. README.md +106 -3
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ ---
6
+ # Isaac-0.2-2B by Perceptron
7
+
8
+ Introducing the 2B parameter variant of Isaac-0.2, the hybrid-reasoning vision-language model.
9
+
10
+ This release brings major upgrades — optional reasoning via thinking traces, perceptive tool calling (including our new Focus system), stronger grounding, better OCR, better desktop use, and improved structured output — while remaining fast, compact, and deployable.
11
+
12
+ ## Extending the efficient frontier of perception
13
+
14
+ Isaac 0.2 extends what we started with Isaac 0.1: small models that outperform systems 10× larger on visual reasoning and perception tasks, all running on commodity GPUs or edge devices.
15
+ From robotics to media search to industrial inspection, Isaac 0.2 delivers high-accuracy perception without the heavy compute footprint.
16
+
17
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/65526dfffb76980adeffa369/yQl-9BAxLud6hhK8gCKLt.png)
18
+
19
+ ## What's New in Isaac 0.2
20
+
21
+ * **Reasoning via Thinking Traces**: Short, structured reasoning traces improve multi-step decisions, small-object understanding, and ambiguous spatial tasks.
22
+
23
+ * **Perceptive Tool Calling + Focus (Zoom & Crop)**: Isaac 0.2 can trigger tool calls to focus (i.e., zoom and crop) and re-query the model on a smaller region — dramatically improving fine-grained perception.
24
+
25
+ * **Structured Outputs**: More reliable structured output generation for consistent JSON and predictable downstream integration.
26
+
27
+ * **Complex OCR**: Improved text recognition across cluttered, low-resolution, or distorted regions — enabling accurate extraction from documents, diagrams, labels, screens, and dense real-world scenes.
28
+
29
+ * **Desktop Use**: Better performance on everyday desktop and mobile workflows such as UI understanding and navigation, making Isaac faster and more capable for agentic use cases.
30
+
31
+
32
+
33
+
34
+ ## Performance Benchmarks
35
+
36
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/65526dfffb76980adeffa369/scKXlSu474L4r8-I6Ahau.png)
37
+
38
+ ## Chatting with Isaac in 🤗 Transformers
39
+ Learn more at our [Huggingface Example Repo](https://github.com/perceptron-ai-inc/perceptron/tree/main/huggingface), where we demo extracting and rendering points.
40
+
41
+ ```bash
42
+ pip install perceptron
43
+ ```
44
+
45
+ ### Usage
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoProcessor
49
+ from transformers.utils.import_utils import is_torch_cuda_available
50
+ from transformers.image_utils import load_image
51
+
52
+ def document_to_messages(document: list[dict]):
53
+ messages, images = [], []
54
+ for item in document:
55
+ if not (content := item.get("content")):
56
+ continue
57
+ role = item.get("role", "user")
58
+ if item.get("type") == "image":
59
+ images.append(load_image(content))
60
+ messages.append({"role": role, "content": "<image>"})
61
+ elif item.get("type") == "text":
62
+ messages.append({"role": role, "content": content})
63
+ return messages, images
64
+
65
+ # Load model/processor from the checkpoint
66
+ checkpoint_path = "PerceptronAI/Isaac-0.2-2B-Preview"
67
+ processor = AutoProcessor.from_pretrained(checkpoint_path, trust_remote_code=True)
68
+ device, dtype = ("cuda","bfloat16") if is_torch_cuda_available() else ("cpu","float32")
69
+ model = AutoModelForCausalLM.from_pretrained(
70
+ checkpoint_path, trust_remote_code=True, vision_attn_implementation="flash_attention_2", dtype = dtype
71
+ ).to(device=device)
72
+
73
+ document = [
74
+ {
75
+ "type": "text",
76
+ "content": "<hint>BOX</hint>",
77
+ "role": "user",
78
+ },
79
+ {
80
+ "type": "image",
81
+ "content": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/refs/heads/main/huggingface/assets/example.webp",
82
+ "role": "user",
83
+ },
84
+ {
85
+ "type": "text",
86
+ "content": "Determine whether it is safe to cross the street. Look for signage and moving traffic.",
87
+ "role": "user",
88
+ },
89
+ ]
90
+
91
+ # Prepare inputs for generation
92
+ messages, images = document_to_messages(document)
93
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
94
+ inputs = processor(text=text, images=images, return_tensors="pt")
95
+
96
+ # Generation
97
+ generated_ids = model.generate(
98
+ tensor_stream=inputs["tensor_stream"].to(device),
99
+ max_new_tokens=256,
100
+ do_sample=False,
101
+ )
102
+ generated_text = processor.tokenizer.decode(generated_ids[0], skip_special_tokens=False)
103
+ print(f"\n Output: {generated_text}")
104
+
105
+ ```
106
+