From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 66
Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published Sep 25, 2025 • 11
Sample-efficient Integration of New Modalities into Large Language Models Paper • 2509.04606 • Published Sep 4, 2025 • 8
Sample-efficient Integration of New Modalities into Large Language Models Paper • 2509.04606 • Published Sep 4, 2025 • 8
Inference-Time Hyper-Scaling with KV Cache Compression Paper • 2506.05345 • Published Jun 5, 2025 • 27
Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Paper • 2506.06006 • Published Jun 6, 2025 • 14