TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents Paper • 2604.24005 • Published 16 days ago • 8
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper • 2505.17826 • Published May 23, 2025 • 10