Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models Paper • 2510.13394 • Published Oct 15, 2025
Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage Paper • 2601.01685 • Published Jan 4
PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors Paper • 2605.06455 • Published 7 days ago • 3
PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors Paper • 2605.06455 • Published 7 days ago • 3