HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models Paper • 2403.13447 • Published Mar 20, 2024 • 19
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 3 days ago • 57
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 3 days ago • 57
Compress & Align: Curating Image-Text Data with Human Knowledge Paper • 2312.06726 • Published Dec 11, 2023 • 1