Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper โข 2508.13167 โข Published Aug 6, 2025 โข 129
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper โข 2505.17667 โข Published May 23, 2025 โข 88
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Paper โข 2505.17018 โข Published May 22, 2025 โข 15
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Paper โข 2505.14684 โข Published May 20, 2025 โข 24
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification Paper โข 2505.16938 โข Published May 22, 2025 โข 121
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper โข 2505.16410 โข Published May 22, 2025 โข 58
MMaDA: Multimodal Large Diffusion Language Models Paper โข 2505.15809 โข Published May 21, 2025 โข 97
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper โข 2503.16252 โข Published Mar 20, 2025 โข 29
Understanding and Diagnosing Deep Reinforcement Learning Paper โข 2406.16979 โข Published Jun 23, 2024 โข 10