Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views Paper • 2606.23557 • Published 4 days ago • 5
Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City Paper • 2606.20980 • Published 8 days ago • 3
CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks Paper • 2606.21777 • Published 7 days ago • 4
WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents Paper • 2606.18847 • Published 9 days ago • 5
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 9 days ago • 61
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents Paper • 2606.19704 • Published 8 days ago • 39
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning Paper • 2606.17682 • Published 10 days ago • 26
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains Paper • 2606.14702 • Published 14 days ago • 31
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 10 days ago • 204
MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding Paper • 2605.30794 • Published 28 days ago • 5
One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA Paper • 2606.10572 • Published 16 days ago • 16
Video2LoRA: Parametric Video Internalization for Vision-Language Models Paper • 2606.04351 • Published 23 days ago • 4