Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published about 12 hours ago • 25
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research Paper • 2602.06540 • Published 7 days ago • 20