Router-Suggest
Collection
Finetuned checkpoints of VLMs for Multimodal Auto-completion
โข
7 items
โข
Updated
This model generates conversational responses conditioned on both textual and visual context. It is suitable for:
The model is not intended for:
Example usage with Hugging Face Transformers:
from transformers import AutoProcessor, AutoModelForVision2Seq
processor = AutoProcessor.from_pretrained("devichand/MiniCPM_V_ImgChat-7B")
model = AutoModelForVision2Seq.from_pretrained("devichand/MiniCPM_V_ImgChat-7B")
inputs = processor(images=your_image,
text="Describe the image.",
return_tensors="pt")
outputs = model.generate(**inputs)
print(processor.decode(outputs[0]))