Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published about 19 hours ago • 20
ReasonMap Collection A fine-grained visual reasoning benchmark (We show more question types in the extension dataset.) • 5 items • Updated about 6 hours ago • 8
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Paper • 2603.15478 • Published about 20 hours ago • 18
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 19 days ago • 197
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published Dec 1, 2025 • 93
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems Paper • 2512.24385 • Published Dec 30, 2025 • 8
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding Paper • 2512.23646 • Published Dec 29, 2025 • 15
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published Dec 22, 2025 • 30
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 32
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding Paper • 2510.23479 • Published Oct 27, 2025 • 18