Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 8 days ago • 111
MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation Paper • 2606.09056 • Published 3 days ago • 4
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models Paper • 2605.30263 • Published 14 days ago • 58
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 4 days ago • 46
Running on Zero Agents Featured 56 VGGT-Omega Demo 🌀 56 3D reconstruction from images/video with VGGT-Omega
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 28 days ago • 86
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 29 days ago • 101
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 24 days ago • 113
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 30 days ago • 191
Region-Constraint In-Context Generation for Instructional Video Editing Paper • 2512.17650 • Published Dec 19, 2025 • 53