AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents Paper • 2607.02255 • Published 3 days ago • 43
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published Apr 23 • 36
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published Mar 24 • 92