AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks? Paper • 2606.05080 • Published 6 days ago • 27
Agent Skills Should Go Beyond Text: The Case for Visual Skills Paper • 2606.01414 • Published 9 days ago • 10
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published 22 days ago • 8
Aurora: Unified Video Editing with a Tool-Using Agent Paper • 2605.18748 • Published 22 days ago • 29
view article Article Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents ibm-granite • Mar 31 • 34
MIRA: Multimodal Iterative Reasoning Agent for Image Editing Paper • 2511.21087 • Published Nov 26, 2025 • 10
MIRA: Multimodal Iterative Reasoning Agent for Image Editing Paper • 2511.21087 • Published Nov 26, 2025 • 10
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 51
Granite Vision Collection Multimodal models built for visual document analysis and image understanding. • 7 items • Updated 18 days ago • 42