LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 1 day ago • 108
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Paper • 2606.12087 • Published 8 days ago • 73
ClawGym: A Scalable Framework for Building Effective Claw Agents Paper • 2604.26904 • Published Apr 29 • 52
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models Paper • 2503.21380 • Published Mar 27, 2025 • 38