VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Paper • 2406.08394 • Published Jun 12, 2024
Language as Queries for Referring Video Object Segmentation Paper • 2201.00487 • Published Jan 3, 2022
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Paper • 2312.14238 • Published Dec 21, 2023 • 20
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Paper • 2305.11175 • Published May 18, 2023 • 3