Trimming the Long-Tail of Visual World Modeling Evaluation Paper • 2606.24256 • Published 9 days ago • 36
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 28 days ago • 44
Does Synthetic Layered Design Data Benefit Layered Design Decomposition? Paper • 2605.15167 • Published May 14 • 9
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 11 days ago • 96