swhitehead yashlara commited on
Commit
0d11015
·
verified ·
0 Parent(s):

Super-squash branch 'main' using huggingface_hub

Browse files

Co-authored-by: yashlara <yashlara@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,259 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Fara-7B: An Efficient Agentic Model for Computer Use
6
+
7
+ [![Microsoft](https://img.shields.io/badge/Microsoft-Project-0078D4?logo=microsoft)](https://aka.ms/msaif/fara)
8
+ [![Hugging Face Dataset](https://img.shields.io/badge/🤗-Dataset-yellow)](https://huggingface.co/datasets/microsoft/WebTailBench)
9
+ [![Foundry](https://img.shields.io/badge/Azure-Foundry-0089D6)](https://aka.ms/foundry-fara-7b)
10
+ [![Github](https://img.shields.io/badge/Github-181717?logo=github&logoColor=white)](https://github.com/microsoft/fara)
11
+
12
+ [Official Microsoft Blog](https://www.microsoft.com/en-us/research/?p=1155843&preview=1&_ppp=0a22f3e916)<br>
13
+ [Technical Report](https://aka.ms/fara-techreport)<br>
14
+ [Github](https://github.com/microsoft/fara)<br>
15
+ [Microsoft Foundry](https://ai.azure.com/explore/models/Fara-7B/version/1/registry/azureml-msr?tid=72f988bf-86f1-41af-91ab-2d7cd011db47)<br>
16
+
17
+ ## Model Summary
18
+
19
+ **Developer:** Microsoft Research
20
+
21
+ **Description:**
22
+ Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.
23
+
24
+ **Model Architecture:**
25
+ Multimodal decoder-only language model that takes an image (screenshot) + text context. It directly predicts thoughts and actions with grounded arguments. Current production baselines leverage Qwen 2.5-VL (7B).
26
+
27
+ **Parameters:** 7 Billion
28
+
29
+ **Inputs:** User goal (text), current screenshot(s), history of previous outputs (thoughts + actions text) from the agent.
30
+
31
+ **Context Length:** 128k
32
+
33
+ **Outputs:** Generated text in response to the input, with a chain-of-thought block followed by a tool call block to indicate the action.
34
+
35
+ **GPUs:** 64 H100s
36
+
37
+ **Training Time:** 2.5 days
38
+
39
+ **Public Data Summary:** N/A
40
+
41
+ **Dates:** Trained between 26th October 2025 to 29th October 2025
42
+
43
+ **Status:** Static model trained on public and private data
44
+
45
+ **Release Date:** November 24th, 2025
46
+
47
+ **License:** MIT
48
+
49
+ **Model Dependencies:** Qwen 2.5 VL
50
+
51
+ **Additional Assets:** N/A
52
+
53
+ **Acceptable Use Policy:** N/A
54
+
55
+ ---
56
+
57
+ ## 1. Model Overview
58
+
59
+ Fara is a 7B Computer Use Agent (CUA) model specialized for taking actions on the web to accomplish high-level user tasks. Beyond understanding webpage layout and basic action mechanics, it plans and executes high-level goals like booking restaurants, applying for jobs, planning trips, and buying shopping lists. Its training relies on a large-scale, fully synthetic dataset of action trajectories generated and verified by a multi-agent pipeline.
60
+
61
+ Fara perceives browser inputs via screenshots, while internal reasoning and state history are recorded textually. Based on recent screenshots and a full history of actions, it predicts the next action with necessary arguments (e.g., coordinates for clicks).
62
+
63
+ ### 1.1 Alignment Approach
64
+
65
+ Fara-7B uses a robust post-training safety approach leveraging open-source and in-house synthetic datasets. It incorporates critical point recognition—situations requiring user permission or sensitive information—to safely halt actions. The model is trained to refuse harmful tasks and undergoes automated red teaming to assess risks, including grounding, jailbreaks, harmful content, and copyright violations.
66
+
67
+ ### 1.2 Safeguards
68
+
69
+ Fara-7B is trained to refuse tasks in categories that violate usage policy:
70
+
71
+ | Type | Description | Examples |
72
+ |------|------------|---------|
73
+ | Illegal Activities | Tasks requiring unlawful actions | Terrorism-related searches, piracy, unauthorized access, weapons creation |
74
+ | Deceptive Tasks | Tasks misleading or impersonating | Fake forms, fraudulent listings, phishing |
75
+ | High-Risk/Regulated Domains | Tasks requiring professional oversight | Medical, legal, financial advice or approvals |
76
+ | Harassment, Exploitation, Hate | Tasks harming or discriminating | Harassment content, stalking, sexualizing minors |
77
+ | Unsafe Technical Use | Misuse of automation | Large-scale scraping, spam, system disruption |
78
+ | Misinformation | Spreading false claims | Publishing unverified claims |
79
+ | Sexual | Erotic or pornographic tasks | Erotic roleplay, porn searches |
80
+
81
+ Critical points where the agent stops include entering personal info, completing purchases, making calls, sending emails, submitting applications, and signing into accounts.
82
+
83
+ ---
84
+
85
+ ## 2. Usage
86
+
87
+ ### 2.1 Primary Use Cases
88
+
89
+ - Automating web tasks such as shopping, booking travel, restaurant reservations, info-seeking, or account workflows.
90
+ - Performs actions step-by-step using multimodal understanding from browser screenshots.
91
+ - On-device execution provides privacy guarantees and lower latency.
92
+
93
+ ### 2.2 Out-of-Scope Use Cases
94
+
95
+ - Model not evaluated for all downstream purposes; consider limitations of LLMs for accuracy, safety, and fairness.
96
+ - Must adhere to applicable laws and regulations.
97
+ - English-only support.
98
+
99
+ ### 2.3 Distribution Channels
100
+
101
+ - Hugging Face
102
+ - Azure AI Foundry
103
+
104
+ ### 2.4 Input Formats
105
+
106
+ Given the nature of the training data, always use the ChatML template with the following system prompt for inference:
107
+
108
+ ---
109
+
110
+ **System Prompt:**
111
+
112
+ You are a web automation agent that performs actions on websites to fulfill user requests by calling various tools.
113
+
114
+ You should stop execution at **Critical Points**. A Critical Point occurs in tasks like:
115
+
116
+ - Checkout
117
+ - Book
118
+ - Purchase
119
+ - Call
120
+ - Email
121
+ - Order
122
+
123
+ A Critical Point requires the user's permission or personal/sensitive information (name, email, credit card, address, payment information, resume, etc.) to complete a transaction (purchase, reservation, sign-up, etc.), or to communicate as a human would (call, email, apply to a job, etc.).
124
+
125
+ **Guideline:** Solve the task as far as possible **up until a Critical Point**.
126
+
127
+ **Examples:**
128
+
129
+ - If the task is to "call a restaurant to make a reservation," do **not** actually make the call. Instead, navigate to the restaurant's page and find the phone number.
130
+ - If the task is to "order new size 12 running shoes," do **not** place the order. Instead, search for the right shoes that meet the criteria and add them to the cart.
131
+
132
+ Some tasks, like answering questions, may not encounter a Critical Point at all.
133
+
134
+ ---
135
+
136
+ **Function Signatures:**
137
+
138
+ You are provided with function signatures within XML tags:
139
+
140
+ ```json
141
+ {
142
+ "type": "function",
143
+ "function": {
144
+ "name": "computer_use",
145
+ "description": "Use a mouse and keyboard to interact with a computer, and take screenshots.\n* This is an interface to a desktop GUI. You do not have access to a terminal or applications menu. You must click on desktop icons to start applications.\n* Some applications may take time to start or process actions, so you may need to wait and take successive screenshots to see the results of your actions. E.g. if you click on Firefox and a window doesn't open, try wait and taking another screenshot.\n* The screen's resolution is 1428x896.\n* Whenever you intend to move the cursor to click on an element like an icon, you should consult a screenshot to determine the coordinates of the element before moving the cursor.\n* If you tried clicking on a program or link but it failed to load, even after waiting, try adjusting your cursor position so that the tip of the cursor visually falls on the element that you want to click.\n* Make sure to click any buttons, links, icons, etc with the cursor tip in the center of the element. Don't click boxes on their edges unless asked.\n* When a separate scrollable container prominently overlays the webpage, if you want to scroll within it, you typically need to mouse_move() over it first and then scroll().\n* If a popup window appears that you want to close, if left_click() on the 'X' or close button doesn't work, try key(keys=['Escape']) to close it.\n* On some search bars, when you type(), you may need to press_enter=False and instead separately call left_click() on the search button to submit the search query. This is especially true of search bars that have auto-suggest popups for e.g. locations\n* For calendar widgets, you usually need to left_click() on arrows to move between months and left_click() on dates to select them; type() is not typically used to input dates there.",
146
+ "parameters": {
147
+ "properties": {
148
+ "action": {
149
+ "description": "The action to perform. The available actions are:\n* key: Performs key down presses on the arguments passed in order, then performs key releases in reverse order. Includes 'Enter', 'Alt', 'Shift', 'Tab', 'Control', 'Backspace', 'Delete', 'Escape', 'ArrowUp', 'ArrowDown', 'ArrowLeft', 'ArrowRight', 'PageDown', 'PageUp', 'Shift', etc.\n* type: Type a string of text on the keyboard.\n* mouse_move: Move the cursor to a specified (x, y) pixel coordinate on the screen.\n* left_click: Click the left mouse button.\n* scroll: Performs a scroll of the mouse scroll wheel.\n* visit_url: Visit a specified URL.\n* web_search: Perform a web search with a specified query.\n* history_back: Go back to the previous page in the browser history.\n* pause_and_memorize_fact: Pause and memorize a fact for future reference.\n* wait: Wait specified seconds for the change to happen.\n* terminate: Terminate the current task and report its completion status.",
150
+ "enum": ["key", "type", "mouse_move", "left_click", "scroll", "visit_url", "web_search", "history_back", "pause_and_memorize_fact", "wait", "terminate"],
151
+ "type": "string"
152
+ },
153
+ "keys": {"description": "Required only by action=key.", "type": "array"},
154
+ "text": {"description": "Required only by action=type.", "type": "string"},
155
+ "coordinate": {"description": "(x, y) coordinates for mouse actions. Required only by action=left_click, action=mouse_move, and action=type.", "type": "array"},
156
+ "pixels": {"description": "Amount of scrolling. Positive = up, Negative = down. Required only by action=scroll.", "type": "number"},
157
+ "url": {"description": "The URL to visit. Required only by action=visit_url.", "type": "string"},
158
+ "query": {"description": "The query to search for. Required only by action=web_search.", "type": "string"},
159
+ "fact": {"description": "The fact to remember for the future. Required only by action=pause_and_memorize_fact.", "type": "string"},
160
+ "time": {"description": "Seconds to wait. Required only by action=wait.", "type": "number"},
161
+ "status": {"description": "Status of the task. Required only by action=terminate.", "type": "string", "enum": ["success", "failure"]}
162
+ },
163
+ "required": ["action"],
164
+ "type": "object"
165
+ }
166
+ }
167
+ }
168
+
169
+ For each function call, return a JSON object with the function name and arguments within XML tags:
170
+
171
+ ```json
172
+ {
173
+ "name": "<function-name>",
174
+ "arguments": <args-json-object>
175
+ }
176
+
177
+ ```
178
+
179
+ - Function signatures provided for all actions (`key`, `type`, `mouse_move`, `left_click`, `scroll`, `visit_url`, `web_search`, `history_back`, `pause_and_memorize_fact`, `wait`, `terminate`).
180
+
181
+
182
+ ### 2.5 Technical Requirements & Integration
183
+
184
+ - Required packages: `torch >=2.7.1`, `transformers >=4.53.3`, `vllm >=0.10.0`
185
+ - Tested on NVIDIA A6000, A100, H100 GPUs (Ubuntu 24.04.3 LTS)
186
+ - Recommended on vLLM server with bf16 precision
187
+ - Provided implementation via Magentic-UI in Docker sandbox for safe web execution
188
+
189
+ ### 2.6 Responsible AI Considerations
190
+
191
+ - English-only; other languages may have degraded performance
192
+ - Potential stereotype reinforcement or inappropriate content
193
+ - Verify outputs, especially in high-stakes or regulated domains
194
+ - Misuse includes fraud, spam, malware generation
195
+ - Use safety services like Azure AI Content Safety where possible
196
+ - Recommended: human-in-the-loop, sandboxing, access control, output verification
197
+
198
+ ---
199
+
200
+ ## 3. Data Overview
201
+
202
+ ### 3.1 Training, Testing, Validation Datasets
203
+
204
+ - Multi-agent data generation pipeline produces synthetic trajectories from seed URLs and open-source tasks
205
+ - Records screenshots, thoughts, action traces, and verification via verifier agents
206
+ - Includes high-quality public datasets: image and text modalities
207
+ - Specialized data: grounding, UI understanding (VQA, captioning, OCR), safety/refusal datasets
208
+
209
+ ---
210
+
211
+ ## 4. Quality and Performance Evaluation
212
+ ### Table: Online Agent Evaluation Results
213
+
214
+ | Model | Params | WebVoyager | Online-M2W | DeepShop | WebTailBench |
215
+ |------------------------------|--------|------------|------------|----------|---------------|
216
+ | **SoM Agents** | | | | | |
217
+ | SoM Agent (GPT-5) | - | 90.6 | 57.7 | 49.1 | 60.4 |
218
+ | SoM Agent (o3) | - | 79.3 | 55.4 | 49.7 | 52.7 |
219
+ | SoM Agent (GPT-4o) | - | 65.1 | 34.6 | 16.0 | 30.8 |
220
+ | GLM-4.1V-9B-Thinking | 9B | 66.8 | 33.9 | 32.0 | 22.4 |
221
+ | **Computer Use Models** | | | | | |
222
+ | OpenAI computer-use-preview | - | 70.9 | 42.9 | 24.7 | 25.7 |
223
+ | UI-TARS-1.5-7B | 7B | 66.4 | 31.3 | 11.6 | 19.5 |
224
+ | Fara-7B | 7B | 73.5 | 34.1 | 26.2 | 38.4 |
225
+
226
+ The table reports task completion success rates on WebVoyager, Online-Mind2Web, DeepShop, and WebTailBench for both SoM agents and native computer-use agents.
227
+ Scores are averaged over 3 runs.
228
+
229
+ ### 4.2 Safety Evaluation & Red-Teaming
230
+
231
+ - Post-training safety with critical point design
232
+ - Red-teaming on Azure: grounding, jailbreaks, harmful content, copyright
233
+
234
+ ### Guidelines for Safe Use
235
+
236
+ - Human-in-the-loop monitoring recommended
237
+ - Do not share sensitive data
238
+ - Run in sandboxed environments
239
+ - Limit internet access via allow-lists/block-lists
240
+ - Avoid use in commercial, high-stakes, or regulated domains
241
+
242
+ **Security Considerations:**
243
+ - Automates interactions across websites, apps, OS; requires strict access control, sandboxing, and monitoring
244
+
245
+ **Contact for More Information:** MSFTAIActRequest@microsoft.com
246
+
247
+ ---
248
+
249
+ ## Appendix: Benchmarks
250
+
251
+ | Benchmark | Link |
252
+ |-----------|------|
253
+ | WebVoyager | [MinorJerry/WebVoyager](https://huggingface.co/datasets/MinorJerry/WebVoyager) |
254
+ | Online-Mind2Web | [osunlp/Online-Mind2Web](https://huggingface.co/datasets/osunlp/Online-Mind2Web) |
255
+ | DeepShop | [DeepShop/DeepShop](https://huggingface.co/datasets/DeepShop/DeepShop) |
256
+ | WebTailBench | [microsoft/WebTailBench](https://huggingface.co/datasets/microsoft/WebTailBench) |
257
+ | ScreenSpot v1 | [rootsautomation/ScreenSpot](https://huggingface.co/datasets/rootsautomation/ScreenSpot) |
258
+ | ScreenSpot v2 | [Voxel51/ScreenSpot-v2](https://huggingface.co/datasets/Voxel51/ScreenSpot-v2) |
259
+ | AgentHarm | [ai-safety-institute/AgentHarm](https://huggingface.co/datasets/ai-safety-institute/AgentHarm) |
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
chat_template.jinja ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
2
+ You are a helpful assistant.<|im_end|>
3
+ {% endif %}<|im_start|>{{ message['role'] }}
4
+ {% if message['content'] is string %}{{ message['content'] }}<|im_end|>
5
+ {% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
6
+ {% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
7
+ {% endif %}
config.json ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2_5_VLForConditionalGeneration"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "eos_token_id": 151645,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 3584,
10
+ "image_token_id": 151655,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 18944,
13
+ "max_position_embeddings": 128000,
14
+ "max_window_layers": 28,
15
+ "model_type": "qwen2_5_vl",
16
+ "num_attention_heads": 28,
17
+ "num_hidden_layers": 28,
18
+ "num_key_value_heads": 4,
19
+ "rms_norm_eps": 1e-06,
20
+ "rope_scaling": {
21
+ "mrope_section": [
22
+ 16,
23
+ 24,
24
+ 24
25
+ ],
26
+ "rope_type": "default",
27
+ "type": "default"
28
+ },
29
+ "rope_theta": 1000000.0,
30
+ "sliding_window": 32768,
31
+ "text_config": {
32
+ "architectures": [
33
+ "Qwen2_5_VLForConditionalGeneration"
34
+ ],
35
+ "attention_dropout": 0.0,
36
+ "bos_token_id": 151643,
37
+ "eos_token_id": 151645,
38
+ "hidden_act": "silu",
39
+ "hidden_size": 3584,
40
+ "image_token_id": null,
41
+ "initializer_range": 0.02,
42
+ "intermediate_size": 18944,
43
+ "max_position_embeddings": 128000,
44
+ "max_window_layers": 28,
45
+ "model_type": "qwen2_5_vl_text",
46
+ "num_attention_heads": 28,
47
+ "num_hidden_layers": 28,
48
+ "num_key_value_heads": 4,
49
+ "rms_norm_eps": 1e-06,
50
+ "rope_scaling": {
51
+ "mrope_section": [
52
+ 16,
53
+ 24,
54
+ 24
55
+ ],
56
+ "rope_type": "default",
57
+ "type": "default"
58
+ },
59
+ "rope_theta": 1000000.0,
60
+ "sliding_window": 32768,
61
+ "torch_dtype": "bfloat16",
62
+ "use_cache": true,
63
+ "use_sliding_window": false,
64
+ "video_token_id": null,
65
+ "vision_end_token_id": 151653,
66
+ "vision_start_token_id": 151652,
67
+ "vision_token_id": 151654,
68
+ "vocab_size": 152064
69
+ },
70
+ "tie_word_embeddings": false,
71
+ "torch_dtype": "bfloat16",
72
+ "transformers_version": "4.52.4",
73
+ "use_cache": true,
74
+ "use_sliding_window": false,
75
+ "video_token_id": 151656,
76
+ "vision_config": {
77
+ "depth": 32,
78
+ "fullatt_block_indexes": [
79
+ 7,
80
+ 15,
81
+ 23,
82
+ 31
83
+ ],
84
+ "hidden_act": "silu",
85
+ "hidden_size": 1280,
86
+ "in_channels": 3,
87
+ "in_chans": 3,
88
+ "initializer_range": 0.02,
89
+ "intermediate_size": 3420,
90
+ "model_type": "qwen2_5_vl",
91
+ "num_heads": 16,
92
+ "out_hidden_size": 3584,
93
+ "patch_size": 14,
94
+ "spatial_merge_size": 2,
95
+ "spatial_patch_size": 14,
96
+ "temporal_patch_size": 2,
97
+ "tokens_per_second": 2,
98
+ "window_size": 112
99
+ },
100
+ "vision_end_token_id": 151653,
101
+ "vision_start_token_id": 151652,
102
+ "vision_token_id": 151654,
103
+ "vocab_size": 152064
104
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.05,
10
+ "temperature": 1e-06,
11
+ "transformers_version": "4.52.4"
12
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d293a0b9ab4add8829d3ac57ee88bc16121ca713bfcc0e76bc9e8aeae7057e89
3
+ size 4968243272
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8424c791566a077b69e589c2220972b1dc3584dea3d1409b9316d2d61f0d82e9
3
+ size 4991495784
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32d6e0f417c8c6aca30ac130d5728f28c3c2cbbf07933bed54a3712a59637442
3
+ size 4932751008
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c0fe359fdfbf4d5d6a32f4024b91f8db77ebd94cabe16d7fe91bf6f357ba210
3
+ size 1691924352
model.safetensors.index.json ADDED
@@ -0,0 +1,736 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 16584333312
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00004-of-00004.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
13
+ "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
16
+ "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
17
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
18
+ "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
19
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
20
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
21
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
22
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
23
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
24
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
25
+ "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
26
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
27
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
28
+ "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
29
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
30
+ "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
31
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
32
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
33
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
34
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
35
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
36
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
37
+ "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
38
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
39
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
40
+ "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
41
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
42
+ "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
43
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
44
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
45
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
46
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
47
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
48
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
49
+ "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
50
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
51
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
52
+ "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
53
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
54
+ "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
55
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
56
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
57
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
58
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
59
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
60
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
61
+ "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
62
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
63
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
64
+ "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
65
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
66
+ "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
67
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
68
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
69
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
70
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
71
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
72
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
73
+ "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
74
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
75
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
76
+ "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
77
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
78
+ "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
79
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
80
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
81
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
82
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
83
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
84
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
85
+ "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
86
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
87
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
88
+ "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
89
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
90
+ "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
91
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
92
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
93
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
94
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
95
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
96
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
97
+ "model.layers.15.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
98
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
99
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
100
+ "model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
101
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
102
+ "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
103
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
104
+ "model.layers.16.input_layernorm.weight": "model-00003-of-00004.safetensors",
105
+ "model.layers.16.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
106
+ "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
107
+ "model.layers.16.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
108
+ "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
109
+ "model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
110
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
111
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
112
+ "model.layers.16.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
113
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
114
+ "model.layers.16.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
115
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
116
+ "model.layers.17.input_layernorm.weight": "model-00003-of-00004.safetensors",
117
+ "model.layers.17.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
118
+ "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
119
+ "model.layers.17.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
120
+ "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
121
+ "model.layers.17.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
122
+ "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
123
+ "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
124
+ "model.layers.17.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
125
+ "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
126
+ "model.layers.17.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
127
+ "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
128
+ "model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
129
+ "model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
130
+ "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
131
+ "model.layers.18.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
132
+ "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
133
+ "model.layers.18.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
134
+ "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
135
+ "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
136
+ "model.layers.18.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
137
+ "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
138
+ "model.layers.18.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
139
+ "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
140
+ "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
141
+ "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
142
+ "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
143
+ "model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
144
+ "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
145
+ "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
146
+ "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
147
+ "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
148
+ "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
149
+ "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
150
+ "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
151
+ "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
152
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
153
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
154
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
155
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
156
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
157
+ "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
158
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
159
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
160
+ "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
161
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
162
+ "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
163
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
164
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
165
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
166
+ "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
167
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
168
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
169
+ "model.layers.20.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
170
+ "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
171
+ "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
172
+ "model.layers.20.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
173
+ "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
174
+ "model.layers.20.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
175
+ "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
176
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
177
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
178
+ "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
179
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
180
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
181
+ "model.layers.21.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
182
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
183
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
184
+ "model.layers.21.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
185
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
186
+ "model.layers.21.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
187
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
188
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
189
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
190
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
191
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
192
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
193
+ "model.layers.22.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
194
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
195
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
196
+ "model.layers.22.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
197
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
198
+ "model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
199
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
200
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
201
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
202
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
203
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
204
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
205
+ "model.layers.23.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
206
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
207
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
208
+ "model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
209
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
210
+ "model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
211
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
212
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
213
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
214
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
215
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
216
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
217
+ "model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
218
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
219
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
220
+ "model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
221
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
222
+ "model.layers.24.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
223
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
224
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
225
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
226
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
227
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
228
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
229
+ "model.layers.25.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
230
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
231
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
232
+ "model.layers.25.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
233
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
234
+ "model.layers.25.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
235
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
236
+ "model.layers.26.input_layernorm.weight": "model-00004-of-00004.safetensors",
237
+ "model.layers.26.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
238
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
239
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
240
+ "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
241
+ "model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
242
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
243
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
244
+ "model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
245
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
246
+ "model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
247
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
248
+ "model.layers.27.input_layernorm.weight": "model-00004-of-00004.safetensors",
249
+ "model.layers.27.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
250
+ "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
251
+ "model.layers.27.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
252
+ "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
253
+ "model.layers.27.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
254
+ "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
255
+ "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
256
+ "model.layers.27.self_attn.q_proj.bias": "model-00004-of-00004.safetensors",
257
+ "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
258
+ "model.layers.27.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
259
+ "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
260
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
261
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
262
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
263
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
264
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
265
+ "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
266
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
267
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
268
+ "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
269
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
270
+ "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
271
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
272
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
273
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
274
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
275
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
276
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
277
+ "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
278
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
279
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
280
+ "model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
281
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
282
+ "model.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
283
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
284
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00004.safetensors",
285
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
286
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
287
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
288
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
289
+ "model.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
290
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
291
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
292
+ "model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
293
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
294
+ "model.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
295
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
296
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00004.safetensors",
297
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
298
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
299
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
300
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
301
+ "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
302
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
303
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
304
+ "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
305
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
306
+ "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
307
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
308
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
309
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
310
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
311
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
312
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
313
+ "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
314
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
315
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
316
+ "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
317
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
318
+ "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
319
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
320
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
321
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
322
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
323
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
324
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
325
+ "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
326
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
327
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
328
+ "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
329
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
330
+ "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
331
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
332
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
333
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
334
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
335
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
336
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
337
+ "model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
338
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
339
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
340
+ "model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
341
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
342
+ "model.layers.9.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
343
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
344
+ "model.norm.weight": "model-00004-of-00004.safetensors",
345
+ "visual.blocks.0.attn.proj.bias": "model-00001-of-00004.safetensors",
346
+ "visual.blocks.0.attn.proj.weight": "model-00001-of-00004.safetensors",
347
+ "visual.blocks.0.attn.qkv.bias": "model-00001-of-00004.safetensors",
348
+ "visual.blocks.0.attn.qkv.weight": "model-00001-of-00004.safetensors",
349
+ "visual.blocks.0.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
350
+ "visual.blocks.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
351
+ "visual.blocks.0.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
352
+ "visual.blocks.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
353
+ "visual.blocks.0.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
354
+ "visual.blocks.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
355
+ "visual.blocks.0.norm1.weight": "model-00001-of-00004.safetensors",
356
+ "visual.blocks.0.norm2.weight": "model-00001-of-00004.safetensors",
357
+ "visual.blocks.1.attn.proj.bias": "model-00001-of-00004.safetensors",
358
+ "visual.blocks.1.attn.proj.weight": "model-00001-of-00004.safetensors",
359
+ "visual.blocks.1.attn.qkv.bias": "model-00001-of-00004.safetensors",
360
+ "visual.blocks.1.attn.qkv.weight": "model-00001-of-00004.safetensors",
361
+ "visual.blocks.1.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
362
+ "visual.blocks.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
363
+ "visual.blocks.1.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
364
+ "visual.blocks.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
365
+ "visual.blocks.1.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
366
+ "visual.blocks.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
367
+ "visual.blocks.1.norm1.weight": "model-00001-of-00004.safetensors",
368
+ "visual.blocks.1.norm2.weight": "model-00001-of-00004.safetensors",
369
+ "visual.blocks.10.attn.proj.bias": "model-00001-of-00004.safetensors",
370
+ "visual.blocks.10.attn.proj.weight": "model-00001-of-00004.safetensors",
371
+ "visual.blocks.10.attn.qkv.bias": "model-00001-of-00004.safetensors",
372
+ "visual.blocks.10.attn.qkv.weight": "model-00001-of-00004.safetensors",
373
+ "visual.blocks.10.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
374
+ "visual.blocks.10.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
375
+ "visual.blocks.10.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
376
+ "visual.blocks.10.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
377
+ "visual.blocks.10.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
378
+ "visual.blocks.10.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
379
+ "visual.blocks.10.norm1.weight": "model-00001-of-00004.safetensors",
380
+ "visual.blocks.10.norm2.weight": "model-00001-of-00004.safetensors",
381
+ "visual.blocks.11.attn.proj.bias": "model-00001-of-00004.safetensors",
382
+ "visual.blocks.11.attn.proj.weight": "model-00001-of-00004.safetensors",
383
+ "visual.blocks.11.attn.qkv.bias": "model-00001-of-00004.safetensors",
384
+ "visual.blocks.11.attn.qkv.weight": "model-00001-of-00004.safetensors",
385
+ "visual.blocks.11.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
386
+ "visual.blocks.11.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
387
+ "visual.blocks.11.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
388
+ "visual.blocks.11.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
389
+ "visual.blocks.11.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
390
+ "visual.blocks.11.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
391
+ "visual.blocks.11.norm1.weight": "model-00001-of-00004.safetensors",
392
+ "visual.blocks.11.norm2.weight": "model-00001-of-00004.safetensors",
393
+ "visual.blocks.12.attn.proj.bias": "model-00001-of-00004.safetensors",
394
+ "visual.blocks.12.attn.proj.weight": "model-00001-of-00004.safetensors",
395
+ "visual.blocks.12.attn.qkv.bias": "model-00001-of-00004.safetensors",
396
+ "visual.blocks.12.attn.qkv.weight": "model-00001-of-00004.safetensors",
397
+ "visual.blocks.12.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
398
+ "visual.blocks.12.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
399
+ "visual.blocks.12.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
400
+ "visual.blocks.12.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
401
+ "visual.blocks.12.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
402
+ "visual.blocks.12.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
403
+ "visual.blocks.12.norm1.weight": "model-00001-of-00004.safetensors",
404
+ "visual.blocks.12.norm2.weight": "model-00001-of-00004.safetensors",
405
+ "visual.blocks.13.attn.proj.bias": "model-00001-of-00004.safetensors",
406
+ "visual.blocks.13.attn.proj.weight": "model-00001-of-00004.safetensors",
407
+ "visual.blocks.13.attn.qkv.bias": "model-00001-of-00004.safetensors",
408
+ "visual.blocks.13.attn.qkv.weight": "model-00001-of-00004.safetensors",
409
+ "visual.blocks.13.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
410
+ "visual.blocks.13.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
411
+ "visual.blocks.13.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
412
+ "visual.blocks.13.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
413
+ "visual.blocks.13.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
414
+ "visual.blocks.13.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
415
+ "visual.blocks.13.norm1.weight": "model-00001-of-00004.safetensors",
416
+ "visual.blocks.13.norm2.weight": "model-00001-of-00004.safetensors",
417
+ "visual.blocks.14.attn.proj.bias": "model-00001-of-00004.safetensors",
418
+ "visual.blocks.14.attn.proj.weight": "model-00001-of-00004.safetensors",
419
+ "visual.blocks.14.attn.qkv.bias": "model-00001-of-00004.safetensors",
420
+ "visual.blocks.14.attn.qkv.weight": "model-00001-of-00004.safetensors",
421
+ "visual.blocks.14.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
422
+ "visual.blocks.14.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
423
+ "visual.blocks.14.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
424
+ "visual.blocks.14.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
425
+ "visual.blocks.14.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
426
+ "visual.blocks.14.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
427
+ "visual.blocks.14.norm1.weight": "model-00001-of-00004.safetensors",
428
+ "visual.blocks.14.norm2.weight": "model-00001-of-00004.safetensors",
429
+ "visual.blocks.15.attn.proj.bias": "model-00001-of-00004.safetensors",
430
+ "visual.blocks.15.attn.proj.weight": "model-00001-of-00004.safetensors",
431
+ "visual.blocks.15.attn.qkv.bias": "model-00001-of-00004.safetensors",
432
+ "visual.blocks.15.attn.qkv.weight": "model-00001-of-00004.safetensors",
433
+ "visual.blocks.15.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
434
+ "visual.blocks.15.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
435
+ "visual.blocks.15.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
436
+ "visual.blocks.15.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
437
+ "visual.blocks.15.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
438
+ "visual.blocks.15.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
439
+ "visual.blocks.15.norm1.weight": "model-00001-of-00004.safetensors",
440
+ "visual.blocks.15.norm2.weight": "model-00001-of-00004.safetensors",
441
+ "visual.blocks.16.attn.proj.bias": "model-00001-of-00004.safetensors",
442
+ "visual.blocks.16.attn.proj.weight": "model-00001-of-00004.safetensors",
443
+ "visual.blocks.16.attn.qkv.bias": "model-00001-of-00004.safetensors",
444
+ "visual.blocks.16.attn.qkv.weight": "model-00001-of-00004.safetensors",
445
+ "visual.blocks.16.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
446
+ "visual.blocks.16.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
447
+ "visual.blocks.16.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
448
+ "visual.blocks.16.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
449
+ "visual.blocks.16.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
450
+ "visual.blocks.16.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
451
+ "visual.blocks.16.norm1.weight": "model-00001-of-00004.safetensors",
452
+ "visual.blocks.16.norm2.weight": "model-00001-of-00004.safetensors",
453
+ "visual.blocks.17.attn.proj.bias": "model-00001-of-00004.safetensors",
454
+ "visual.blocks.17.attn.proj.weight": "model-00001-of-00004.safetensors",
455
+ "visual.blocks.17.attn.qkv.bias": "model-00001-of-00004.safetensors",
456
+ "visual.blocks.17.attn.qkv.weight": "model-00001-of-00004.safetensors",
457
+ "visual.blocks.17.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
458
+ "visual.blocks.17.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
459
+ "visual.blocks.17.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
460
+ "visual.blocks.17.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
461
+ "visual.blocks.17.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
462
+ "visual.blocks.17.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
463
+ "visual.blocks.17.norm1.weight": "model-00001-of-00004.safetensors",
464
+ "visual.blocks.17.norm2.weight": "model-00001-of-00004.safetensors",
465
+ "visual.blocks.18.attn.proj.bias": "model-00001-of-00004.safetensors",
466
+ "visual.blocks.18.attn.proj.weight": "model-00001-of-00004.safetensors",
467
+ "visual.blocks.18.attn.qkv.bias": "model-00001-of-00004.safetensors",
468
+ "visual.blocks.18.attn.qkv.weight": "model-00001-of-00004.safetensors",
469
+ "visual.blocks.18.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
470
+ "visual.blocks.18.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
471
+ "visual.blocks.18.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
472
+ "visual.blocks.18.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
473
+ "visual.blocks.18.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
474
+ "visual.blocks.18.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
475
+ "visual.blocks.18.norm1.weight": "model-00001-of-00004.safetensors",
476
+ "visual.blocks.18.norm2.weight": "model-00001-of-00004.safetensors",
477
+ "visual.blocks.19.attn.proj.bias": "model-00001-of-00004.safetensors",
478
+ "visual.blocks.19.attn.proj.weight": "model-00001-of-00004.safetensors",
479
+ "visual.blocks.19.attn.qkv.bias": "model-00001-of-00004.safetensors",
480
+ "visual.blocks.19.attn.qkv.weight": "model-00001-of-00004.safetensors",
481
+ "visual.blocks.19.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
482
+ "visual.blocks.19.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
483
+ "visual.blocks.19.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
484
+ "visual.blocks.19.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
485
+ "visual.blocks.19.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
486
+ "visual.blocks.19.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
487
+ "visual.blocks.19.norm1.weight": "model-00001-of-00004.safetensors",
488
+ "visual.blocks.19.norm2.weight": "model-00001-of-00004.safetensors",
489
+ "visual.blocks.2.attn.proj.bias": "model-00001-of-00004.safetensors",
490
+ "visual.blocks.2.attn.proj.weight": "model-00001-of-00004.safetensors",
491
+ "visual.blocks.2.attn.qkv.bias": "model-00001-of-00004.safetensors",
492
+ "visual.blocks.2.attn.qkv.weight": "model-00001-of-00004.safetensors",
493
+ "visual.blocks.2.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
494
+ "visual.blocks.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
495
+ "visual.blocks.2.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
496
+ "visual.blocks.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
497
+ "visual.blocks.2.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
498
+ "visual.blocks.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
499
+ "visual.blocks.2.norm1.weight": "model-00001-of-00004.safetensors",
500
+ "visual.blocks.2.norm2.weight": "model-00001-of-00004.safetensors",
501
+ "visual.blocks.20.attn.proj.bias": "model-00001-of-00004.safetensors",
502
+ "visual.blocks.20.attn.proj.weight": "model-00001-of-00004.safetensors",
503
+ "visual.blocks.20.attn.qkv.bias": "model-00001-of-00004.safetensors",
504
+ "visual.blocks.20.attn.qkv.weight": "model-00001-of-00004.safetensors",
505
+ "visual.blocks.20.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
506
+ "visual.blocks.20.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
507
+ "visual.blocks.20.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
508
+ "visual.blocks.20.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
509
+ "visual.blocks.20.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
510
+ "visual.blocks.20.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
511
+ "visual.blocks.20.norm1.weight": "model-00001-of-00004.safetensors",
512
+ "visual.blocks.20.norm2.weight": "model-00001-of-00004.safetensors",
513
+ "visual.blocks.21.attn.proj.bias": "model-00001-of-00004.safetensors",
514
+ "visual.blocks.21.attn.proj.weight": "model-00001-of-00004.safetensors",
515
+ "visual.blocks.21.attn.qkv.bias": "model-00001-of-00004.safetensors",
516
+ "visual.blocks.21.attn.qkv.weight": "model-00001-of-00004.safetensors",
517
+ "visual.blocks.21.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
518
+ "visual.blocks.21.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
519
+ "visual.blocks.21.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
520
+ "visual.blocks.21.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
521
+ "visual.blocks.21.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
522
+ "visual.blocks.21.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
523
+ "visual.blocks.21.norm1.weight": "model-00001-of-00004.safetensors",
524
+ "visual.blocks.21.norm2.weight": "model-00001-of-00004.safetensors",
525
+ "visual.blocks.22.attn.proj.bias": "model-00001-of-00004.safetensors",
526
+ "visual.blocks.22.attn.proj.weight": "model-00001-of-00004.safetensors",
527
+ "visual.blocks.22.attn.qkv.bias": "model-00001-of-00004.safetensors",
528
+ "visual.blocks.22.attn.qkv.weight": "model-00001-of-00004.safetensors",
529
+ "visual.blocks.22.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
530
+ "visual.blocks.22.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
531
+ "visual.blocks.22.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
532
+ "visual.blocks.22.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
533
+ "visual.blocks.22.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
534
+ "visual.blocks.22.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
535
+ "visual.blocks.22.norm1.weight": "model-00001-of-00004.safetensors",
536
+ "visual.blocks.22.norm2.weight": "model-00001-of-00004.safetensors",
537
+ "visual.blocks.23.attn.proj.bias": "model-00001-of-00004.safetensors",
538
+ "visual.blocks.23.attn.proj.weight": "model-00001-of-00004.safetensors",
539
+ "visual.blocks.23.attn.qkv.bias": "model-00001-of-00004.safetensors",
540
+ "visual.blocks.23.attn.qkv.weight": "model-00001-of-00004.safetensors",
541
+ "visual.blocks.23.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
542
+ "visual.blocks.23.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
543
+ "visual.blocks.23.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
544
+ "visual.blocks.23.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
545
+ "visual.blocks.23.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
546
+ "visual.blocks.23.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
547
+ "visual.blocks.23.norm1.weight": "model-00001-of-00004.safetensors",
548
+ "visual.blocks.23.norm2.weight": "model-00001-of-00004.safetensors",
549
+ "visual.blocks.24.attn.proj.bias": "model-00001-of-00004.safetensors",
550
+ "visual.blocks.24.attn.proj.weight": "model-00001-of-00004.safetensors",
551
+ "visual.blocks.24.attn.qkv.bias": "model-00001-of-00004.safetensors",
552
+ "visual.blocks.24.attn.qkv.weight": "model-00001-of-00004.safetensors",
553
+ "visual.blocks.24.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
554
+ "visual.blocks.24.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
555
+ "visual.blocks.24.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
556
+ "visual.blocks.24.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
557
+ "visual.blocks.24.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
558
+ "visual.blocks.24.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
559
+ "visual.blocks.24.norm1.weight": "model-00001-of-00004.safetensors",
560
+ "visual.blocks.24.norm2.weight": "model-00001-of-00004.safetensors",
561
+ "visual.blocks.25.attn.proj.bias": "model-00001-of-00004.safetensors",
562
+ "visual.blocks.25.attn.proj.weight": "model-00001-of-00004.safetensors",
563
+ "visual.blocks.25.attn.qkv.bias": "model-00001-of-00004.safetensors",
564
+ "visual.blocks.25.attn.qkv.weight": "model-00001-of-00004.safetensors",
565
+ "visual.blocks.25.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
566
+ "visual.blocks.25.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
567
+ "visual.blocks.25.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
568
+ "visual.blocks.25.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
569
+ "visual.blocks.25.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
570
+ "visual.blocks.25.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
571
+ "visual.blocks.25.norm1.weight": "model-00001-of-00004.safetensors",
572
+ "visual.blocks.25.norm2.weight": "model-00001-of-00004.safetensors",
573
+ "visual.blocks.26.attn.proj.bias": "model-00001-of-00004.safetensors",
574
+ "visual.blocks.26.attn.proj.weight": "model-00001-of-00004.safetensors",
575
+ "visual.blocks.26.attn.qkv.bias": "model-00001-of-00004.safetensors",
576
+ "visual.blocks.26.attn.qkv.weight": "model-00001-of-00004.safetensors",
577
+ "visual.blocks.26.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
578
+ "visual.blocks.26.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
579
+ "visual.blocks.26.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
580
+ "visual.blocks.26.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
581
+ "visual.blocks.26.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
582
+ "visual.blocks.26.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
583
+ "visual.blocks.26.norm1.weight": "model-00001-of-00004.safetensors",
584
+ "visual.blocks.26.norm2.weight": "model-00001-of-00004.safetensors",
585
+ "visual.blocks.27.attn.proj.bias": "model-00001-of-00004.safetensors",
586
+ "visual.blocks.27.attn.proj.weight": "model-00001-of-00004.safetensors",
587
+ "visual.blocks.27.attn.qkv.bias": "model-00001-of-00004.safetensors",
588
+ "visual.blocks.27.attn.qkv.weight": "model-00001-of-00004.safetensors",
589
+ "visual.blocks.27.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
590
+ "visual.blocks.27.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
591
+ "visual.blocks.27.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
592
+ "visual.blocks.27.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
593
+ "visual.blocks.27.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
594
+ "visual.blocks.27.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
595
+ "visual.blocks.27.norm1.weight": "model-00001-of-00004.safetensors",
596
+ "visual.blocks.27.norm2.weight": "model-00001-of-00004.safetensors",
597
+ "visual.blocks.28.attn.proj.bias": "model-00001-of-00004.safetensors",
598
+ "visual.blocks.28.attn.proj.weight": "model-00001-of-00004.safetensors",
599
+ "visual.blocks.28.attn.qkv.bias": "model-00001-of-00004.safetensors",
600
+ "visual.blocks.28.attn.qkv.weight": "model-00001-of-00004.safetensors",
601
+ "visual.blocks.28.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
602
+ "visual.blocks.28.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
603
+ "visual.blocks.28.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
604
+ "visual.blocks.28.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
605
+ "visual.blocks.28.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
606
+ "visual.blocks.28.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
607
+ "visual.blocks.28.norm1.weight": "model-00001-of-00004.safetensors",
608
+ "visual.blocks.28.norm2.weight": "model-00001-of-00004.safetensors",
609
+ "visual.blocks.29.attn.proj.bias": "model-00001-of-00004.safetensors",
610
+ "visual.blocks.29.attn.proj.weight": "model-00001-of-00004.safetensors",
611
+ "visual.blocks.29.attn.qkv.bias": "model-00001-of-00004.safetensors",
612
+ "visual.blocks.29.attn.qkv.weight": "model-00001-of-00004.safetensors",
613
+ "visual.blocks.29.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
614
+ "visual.blocks.29.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
615
+ "visual.blocks.29.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
616
+ "visual.blocks.29.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
617
+ "visual.blocks.29.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
618
+ "visual.blocks.29.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
619
+ "visual.blocks.29.norm1.weight": "model-00001-of-00004.safetensors",
620
+ "visual.blocks.29.norm2.weight": "model-00001-of-00004.safetensors",
621
+ "visual.blocks.3.attn.proj.bias": "model-00001-of-00004.safetensors",
622
+ "visual.blocks.3.attn.proj.weight": "model-00001-of-00004.safetensors",
623
+ "visual.blocks.3.attn.qkv.bias": "model-00001-of-00004.safetensors",
624
+ "visual.blocks.3.attn.qkv.weight": "model-00001-of-00004.safetensors",
625
+ "visual.blocks.3.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
626
+ "visual.blocks.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
627
+ "visual.blocks.3.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
628
+ "visual.blocks.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
629
+ "visual.blocks.3.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
630
+ "visual.blocks.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
631
+ "visual.blocks.3.norm1.weight": "model-00001-of-00004.safetensors",
632
+ "visual.blocks.3.norm2.weight": "model-00001-of-00004.safetensors",
633
+ "visual.blocks.30.attn.proj.bias": "model-00001-of-00004.safetensors",
634
+ "visual.blocks.30.attn.proj.weight": "model-00001-of-00004.safetensors",
635
+ "visual.blocks.30.attn.qkv.bias": "model-00001-of-00004.safetensors",
636
+ "visual.blocks.30.attn.qkv.weight": "model-00001-of-00004.safetensors",
637
+ "visual.blocks.30.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
638
+ "visual.blocks.30.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
639
+ "visual.blocks.30.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
640
+ "visual.blocks.30.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
641
+ "visual.blocks.30.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
642
+ "visual.blocks.30.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
643
+ "visual.blocks.30.norm1.weight": "model-00001-of-00004.safetensors",
644
+ "visual.blocks.30.norm2.weight": "model-00001-of-00004.safetensors",
645
+ "visual.blocks.31.attn.proj.bias": "model-00001-of-00004.safetensors",
646
+ "visual.blocks.31.attn.proj.weight": "model-00001-of-00004.safetensors",
647
+ "visual.blocks.31.attn.qkv.bias": "model-00001-of-00004.safetensors",
648
+ "visual.blocks.31.attn.qkv.weight": "model-00001-of-00004.safetensors",
649
+ "visual.blocks.31.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
650
+ "visual.blocks.31.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
651
+ "visual.blocks.31.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
652
+ "visual.blocks.31.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
653
+ "visual.blocks.31.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
654
+ "visual.blocks.31.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
655
+ "visual.blocks.31.norm1.weight": "model-00001-of-00004.safetensors",
656
+ "visual.blocks.31.norm2.weight": "model-00001-of-00004.safetensors",
657
+ "visual.blocks.4.attn.proj.bias": "model-00001-of-00004.safetensors",
658
+ "visual.blocks.4.attn.proj.weight": "model-00001-of-00004.safetensors",
659
+ "visual.blocks.4.attn.qkv.bias": "model-00001-of-00004.safetensors",
660
+ "visual.blocks.4.attn.qkv.weight": "model-00001-of-00004.safetensors",
661
+ "visual.blocks.4.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
662
+ "visual.blocks.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
663
+ "visual.blocks.4.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
664
+ "visual.blocks.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
665
+ "visual.blocks.4.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
666
+ "visual.blocks.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
667
+ "visual.blocks.4.norm1.weight": "model-00001-of-00004.safetensors",
668
+ "visual.blocks.4.norm2.weight": "model-00001-of-00004.safetensors",
669
+ "visual.blocks.5.attn.proj.bias": "model-00001-of-00004.safetensors",
670
+ "visual.blocks.5.attn.proj.weight": "model-00001-of-00004.safetensors",
671
+ "visual.blocks.5.attn.qkv.bias": "model-00001-of-00004.safetensors",
672
+ "visual.blocks.5.attn.qkv.weight": "model-00001-of-00004.safetensors",
673
+ "visual.blocks.5.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
674
+ "visual.blocks.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
675
+ "visual.blocks.5.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
676
+ "visual.blocks.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
677
+ "visual.blocks.5.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
678
+ "visual.blocks.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
679
+ "visual.blocks.5.norm1.weight": "model-00001-of-00004.safetensors",
680
+ "visual.blocks.5.norm2.weight": "model-00001-of-00004.safetensors",
681
+ "visual.blocks.6.attn.proj.bias": "model-00001-of-00004.safetensors",
682
+ "visual.blocks.6.attn.proj.weight": "model-00001-of-00004.safetensors",
683
+ "visual.blocks.6.attn.qkv.bias": "model-00001-of-00004.safetensors",
684
+ "visual.blocks.6.attn.qkv.weight": "model-00001-of-00004.safetensors",
685
+ "visual.blocks.6.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
686
+ "visual.blocks.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
687
+ "visual.blocks.6.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
688
+ "visual.blocks.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
689
+ "visual.blocks.6.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
690
+ "visual.blocks.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
691
+ "visual.blocks.6.norm1.weight": "model-00001-of-00004.safetensors",
692
+ "visual.blocks.6.norm2.weight": "model-00001-of-00004.safetensors",
693
+ "visual.blocks.7.attn.proj.bias": "model-00001-of-00004.safetensors",
694
+ "visual.blocks.7.attn.proj.weight": "model-00001-of-00004.safetensors",
695
+ "visual.blocks.7.attn.qkv.bias": "model-00001-of-00004.safetensors",
696
+ "visual.blocks.7.attn.qkv.weight": "model-00001-of-00004.safetensors",
697
+ "visual.blocks.7.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
698
+ "visual.blocks.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
699
+ "visual.blocks.7.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
700
+ "visual.blocks.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
701
+ "visual.blocks.7.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
702
+ "visual.blocks.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
703
+ "visual.blocks.7.norm1.weight": "model-00001-of-00004.safetensors",
704
+ "visual.blocks.7.norm2.weight": "model-00001-of-00004.safetensors",
705
+ "visual.blocks.8.attn.proj.bias": "model-00001-of-00004.safetensors",
706
+ "visual.blocks.8.attn.proj.weight": "model-00001-of-00004.safetensors",
707
+ "visual.blocks.8.attn.qkv.bias": "model-00001-of-00004.safetensors",
708
+ "visual.blocks.8.attn.qkv.weight": "model-00001-of-00004.safetensors",
709
+ "visual.blocks.8.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
710
+ "visual.blocks.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
711
+ "visual.blocks.8.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
712
+ "visual.blocks.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
713
+ "visual.blocks.8.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
714
+ "visual.blocks.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
715
+ "visual.blocks.8.norm1.weight": "model-00001-of-00004.safetensors",
716
+ "visual.blocks.8.norm2.weight": "model-00001-of-00004.safetensors",
717
+ "visual.blocks.9.attn.proj.bias": "model-00001-of-00004.safetensors",
718
+ "visual.blocks.9.attn.proj.weight": "model-00001-of-00004.safetensors",
719
+ "visual.blocks.9.attn.qkv.bias": "model-00001-of-00004.safetensors",
720
+ "visual.blocks.9.attn.qkv.weight": "model-00001-of-00004.safetensors",
721
+ "visual.blocks.9.mlp.down_proj.bias": "model-00001-of-00004.safetensors",
722
+ "visual.blocks.9.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
723
+ "visual.blocks.9.mlp.gate_proj.bias": "model-00001-of-00004.safetensors",
724
+ "visual.blocks.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
725
+ "visual.blocks.9.mlp.up_proj.bias": "model-00001-of-00004.safetensors",
726
+ "visual.blocks.9.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
727
+ "visual.blocks.9.norm1.weight": "model-00001-of-00004.safetensors",
728
+ "visual.blocks.9.norm2.weight": "model-00001-of-00004.safetensors",
729
+ "visual.merger.ln_q.weight": "model-00001-of-00004.safetensors",
730
+ "visual.merger.mlp.0.bias": "model-00001-of-00004.safetensors",
731
+ "visual.merger.mlp.0.weight": "model-00001-of-00004.safetensors",
732
+ "visual.merger.mlp.2.bias": "model-00001-of-00004.safetensors",
733
+ "visual.merger.mlp.2.weight": "model-00001-of-00004.safetensors",
734
+ "visual.patch_embed.proj.weight": "model-00001-of-00004.safetensors"
735
+ }
736
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.48145466,
8
+ 0.4578275,
9
+ 0.40821073
10
+ ],
11
+ "image_processor_type": "Qwen2VLImageProcessor",
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "max_pixels": 12845056,
18
+ "merge_size": 2,
19
+ "min_pixels": 3136,
20
+ "patch_size": 14,
21
+ "processor_class": "Qwen2_5_VLProcessor",
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "longest_edge": 12845056,
26
+ "shortest_edge": 3136
27
+ },
28
+ "temporal_patch_size": 2
29
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "processor_class": "Qwen2_5_VLProcessor",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
video_preprocessor_config.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_valid_kwargs_names": [
3
+ "do_convert_rgb",
4
+ "do_resize",
5
+ "size",
6
+ "size_divisor",
7
+ "default_to_square",
8
+ "resample",
9
+ "do_rescale",
10
+ "rescale_factor",
11
+ "do_normalize",
12
+ "image_mean",
13
+ "image_std",
14
+ "do_pad",
15
+ "do_center_crop",
16
+ "crop_size",
17
+ "data_format",
18
+ "input_data_format",
19
+ "device",
20
+ "min_pixels",
21
+ "max_pixels",
22
+ "patch_size",
23
+ "temporal_patch_size",
24
+ "merge_size"
25
+ ],
26
+ "crop_size": null,
27
+ "data_format": "channels_first",
28
+ "default_to_square": true,
29
+ "device": null,
30
+ "do_center_crop": null,
31
+ "do_convert_rgb": true,
32
+ "do_normalize": true,
33
+ "do_pad": null,
34
+ "do_rescale": true,
35
+ "do_resize": true,
36
+ "image_mean": [
37
+ 0.48145466,
38
+ 0.4578275,
39
+ 0.40821073
40
+ ],
41
+ "image_processor_type": "Qwen2VLImageProcessor",
42
+ "image_std": [
43
+ 0.26862954,
44
+ 0.26130258,
45
+ 0.27577711
46
+ ],
47
+ "input_data_format": null,
48
+ "max_pixels": 12845056,
49
+ "merge_size": 2,
50
+ "min_pixels": 3136,
51
+ "model_valid_processing_keys": [
52
+ "do_convert_rgb",
53
+ "do_resize",
54
+ "size",
55
+ "size_divisor",
56
+ "default_to_square",
57
+ "resample",
58
+ "do_rescale",
59
+ "rescale_factor",
60
+ "do_normalize",
61
+ "image_mean",
62
+ "image_std",
63
+ "do_pad",
64
+ "do_center_crop",
65
+ "crop_size",
66
+ "data_format",
67
+ "input_data_format",
68
+ "device",
69
+ "min_pixels",
70
+ "max_pixels",
71
+ "patch_size",
72
+ "temporal_patch_size",
73
+ "merge_size"
74
+ ],
75
+ "patch_size": 14,
76
+ "processor_class": "Qwen2_5_VLProcessor",
77
+ "resample": 3,
78
+ "rescale_factor": 0.00392156862745098,
79
+ "size": {
80
+ "longest_edge": 12845056,
81
+ "shortest_edge": 3136
82
+ },
83
+ "size_divisor": null,
84
+ "temporal_patch_size": 2,
85
+ "video_processor_type": "Qwen2VLVideoProcessor"
86
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff