OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data Paper • 2606.13432 • Published 5 days ago • 91
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks Paper • 2606.12344 • Published 6 days ago • 65
WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction Paper • 2605.29341 • Published 19 days ago • 18
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 271
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 249