UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating Paper • 2606.21661 • Published 10 days ago • 24
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching Paper • 2311.12751 • Published Nov 21, 2023 • 2
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating Paper • 2606.21661 • Published 10 days ago • 24
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 20 days ago • 127
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published May 18 • 116
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 167
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 231 • 5
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 231 • 5
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing Paper • 2604.04911 • Published Apr 6 • 36
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 231
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 231 • 5
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 231
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Paper • 2506.10857 • Published Jun 12, 2025 • 30
SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation Paper • 2601.14615 • Published Jan 21 • 1
VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis Paper • 2512.19243 • Published Dec 22, 2025 • 1