Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ ## 💡 Overview
5
+
6
+ > *"The soul never thinks without an image." — Aristotle*
7
+
8
+ **V-Thinker** is a general-purpose multimodal reasoning assistant that enables **Interactive Thinking with Images** through end-to-end reinforcement learning. Unlike traditional vision-language models, V-Thinker actively **interacts** with visual content—editing, annotating, and transforming images to simplify complex problems.
9
+ ```bash
10
+ import torch
11
+ import os
12
+ import json
13
+ import argparse
14
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, AutoConfig, Qwen3VLForConditionalGeneration
15
+ from tqdm import tqdm
16
+ from utils import run_evaluation # Assuming you have this utility function
17
+ MODEL_PATH=""
18
+
19
+ config = AutoConfig.from_pretrained(MODEL_PATH)
20
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
21
+ MODEL_PATH,
22
+ device_map="auto", # "auto" works perfectly with CUDA_VISIBLE_DEVICES
23
+ config=config
24
+ )
25
+ processor = AutoProcessor.from_pretrained(MODEL_PATH)
26
+
27
+ question_text = "Question: Hint: Please answer the question and provide the final answer at the end.\nQuestion: How many lines of symmetry does this figure have?\n\n\nPlease provide the final answer in the format <answer>X</answer>"
28
+ image_path = "./224.png"
29
+
30
+ # Construct the full, normalized image pat
31
+ final_assistant_response, final_answer, aux_path = run_evaluation(question_text, image_path, "./", model, processor)
32
+ print("Model Response")
33
+ print(final_answer)
34
+ print("auxiliary path")
35
+ print(final_answer)
36
+
37
+ ```