Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published May 27 • 93
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR Paper • 2605.15726 • Published May 15 • 35
agent-distillation/Qwen2.5-32B-Instruct_cot_trajectories_2k Viewer • Updated Jun 9, 2025 • 3k • 32 • 1
agent-distillation/Qwen2.5-32B-Instruct_cot_trajectories_2k Viewer • Updated Jun 9, 2025 • 3k • 32 • 1