WBench
Collection
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation • 4 items • Updated • 3
None defined yet.
Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions