Add model card and Apache-2.0 license

#3
Files changed (1) hide show
  1. README.md +1 -30
README.md CHANGED
@@ -32,6 +32,7 @@ injection under a common grounded reasoning policy.
32
 
33
  - **Base model:** [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
34
  - **Training data:** [IQuestLab/UniReason-Med-Data](https://huggingface.co/datasets/IQuestLab/UniReason-Med-Data)
 
35
  - **Modalities:** image + text → text
36
  - **License:** Apache-2.0
37
 
@@ -63,36 +64,6 @@ This checkpoint is the merged Hugging Face model exported from the GRPO stage.
63
  Training code (LLaMA-Factory for SFT, verl for GRPO) and configs are released at:
64
  <https://github.com/IQuestLab/unireason-med>.
65
 
66
- ## Usage
67
-
68
- ```python
69
- from transformers import AutoModelForImageTextToText, AutoProcessor
70
- from PIL import Image
71
-
72
- model_id = "IQuestLab/UniReason-Med"
73
- processor = AutoProcessor.from_pretrained(model_id)
74
- model = AutoModelForImageTextToText.from_pretrained(model_id, device_map="auto")
75
-
76
- image = Image.open("medical_image.png")
77
- messages = [
78
- {
79
- "role": "user",
80
- "content": [
81
- {"type": "image"},
82
- {"type": "text", "text": "What is the most likely diagnosis? Reason step by step."},
83
- ],
84
- }
85
- ]
86
- prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
87
- inputs = processor(text=[prompt], images=[image], return_tensors="pt").to(model.device)
88
- output = model.generate(**inputs, max_new_tokens=1024)
89
- print(processor.batch_decode(output, skip_special_tokens=True)[0])
90
- ```
91
-
92
- The model produces interleaved reasoning with bounding boxes over the input image. Reproducing
93
- the full grounded crop-and-continue loop (crop the predicted region and feed it back as visual
94
- input) follows the agent/rollout logic in the released training code.
95
-
96
  ## Intended Use and Limitations
97
 
98
  - **Intended use:** research on medical multimodal reasoning, visual grounding, and 2D-to-3D
 
32
 
33
  - **Base model:** [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
34
  - **Training data:** [IQuestLab/UniReason-Med-Data](https://huggingface.co/datasets/IQuestLab/UniReason-Med-Data)
35
+ - **Code:** [github.com/IQuestLab/unireason-med](https://github.com/IQuestLab/unireason-med)
36
  - **Modalities:** image + text → text
37
  - **License:** Apache-2.0
38
 
 
64
  Training code (LLaMA-Factory for SFT, verl for GRPO) and configs are released at:
65
  <https://github.com/IQuestLab/unireason-med>.
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  ## Intended Use and Limitations
68
 
69
  - **Intended use:** research on medical multimodal reasoning, visual grounding, and 2D-to-3D