Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Paper • 2401.14159 • Published Jan 25, 2024 • 6
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published May 31, 2025 • 31
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19, 2025 • 26
SIFT: Grounding LLM Reasoning in Contexts via Stickers Paper • 2502.14922 • Published Feb 19, 2025 • 32
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation Paper • 2502.05415 • Published Feb 8, 2025 • 20