TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Paper • 2603.16448 • Published 6 days ago • 55
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 193
GARDO: Reinforcing Diffusion Models without Reward Hacking Paper • 2512.24138 • Published Dec 30, 2025 • 30
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published Dec 22, 2025 • 66
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 39