Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ metrics:
|
|
| 11 |
tags:
|
| 12 |
- multimodal
|
| 13 |
model-index:
|
| 14 |
-
- name: LLaVA-
|
| 15 |
results:
|
| 16 |
- task:
|
| 17 |
type: multimodal
|
|
@@ -117,7 +117,7 @@ base_model:
|
|
| 117 |
- lmms-lab/llava-onevision-qwen2-7b-si
|
| 118 |
---
|
| 119 |
|
| 120 |
-
# LLaVA-
|
| 121 |
|
| 122 |
## Table of Contents
|
| 123 |
|
|
@@ -130,7 +130,7 @@ base_model:
|
|
| 130 |
|
| 131 |
## Model Summary
|
| 132 |
|
| 133 |
-
The LLaVA-
|
| 134 |
|
| 135 |
This model support at most 64 frames.
|
| 136 |
|
|
@@ -143,7 +143,7 @@ This model support at most 64 frames.
|
|
| 143 |
|
| 144 |
### Intended use
|
| 145 |
|
| 146 |
-
The model was trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-
|
| 147 |
|
| 148 |
|
| 149 |
|
|
@@ -186,7 +186,7 @@ def load_video(self, video_path, max_frames_num,fps=1,force_sample=False):
|
|
| 186 |
spare_frames = vr.get_batch(frame_idx).asnumpy()
|
| 187 |
# import pdb;pdb.set_trace()
|
| 188 |
return spare_frames,frame_time,video_time
|
| 189 |
-
pretrained = "lmms-lab/LLaVA-
|
| 190 |
model_name = "llava_qwen"
|
| 191 |
device = "cuda"
|
| 192 |
device_map = "auto"
|
|
|
|
| 11 |
tags:
|
| 12 |
- multimodal
|
| 13 |
model-index:
|
| 14 |
+
- name: LLaVA-Video-7B-Qwen2
|
| 15 |
results:
|
| 16 |
- task:
|
| 17 |
type: multimodal
|
|
|
|
| 117 |
- lmms-lab/llava-onevision-qwen2-7b-si
|
| 118 |
---
|
| 119 |
|
| 120 |
+
# LLaVA-Video-7B-Qwen2
|
| 121 |
|
| 122 |
## Table of Contents
|
| 123 |
|
|
|
|
| 130 |
|
| 131 |
## Model Summary
|
| 132 |
|
| 133 |
+
The LLaVA-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
|
| 134 |
|
| 135 |
This model support at most 64 frames.
|
| 136 |
|
|
|
|
| 143 |
|
| 144 |
### Intended use
|
| 145 |
|
| 146 |
+
The model was trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), having the ability to interact with images, multi-image and videos, but specific to videos.
|
| 147 |
|
| 148 |
|
| 149 |
|
|
|
|
| 186 |
spare_frames = vr.get_batch(frame_idx).asnumpy()
|
| 187 |
# import pdb;pdb.set_trace()
|
| 188 |
return spare_frames,frame_time,video_time
|
| 189 |
+
pretrained = "lmms-lab/LLaVA-Video-7B-Qwen2"
|
| 190 |
model_name = "llava_qwen"
|
| 191 |
device = "cuda"
|
| 192 |
device_map = "auto"
|