Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 6 days ago • 54
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22, 2025 • 43
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Paper • 2606.13578 • Published 11 days ago • 54
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Paper • 2606.12087 • Published 12 days ago • 75
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 11 days ago • 140
InterleaveThinker: Reinforcing Agentic Interleaved Generation Paper • 2606.13679 • Published 11 days ago • 80
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 11 days ago • 103
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 26 days ago • 93
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 27 days ago • 144
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 28 days ago • 138
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving Paper • 2605.22809 • Published May 21 • 27