Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)? Paper • 2605.30557 • Published 13 days ago • 12
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 9 days ago • 224
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion Paper • 2605.30351 • Published 13 days ago • 26
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 189
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 249