Efficient World Models with Context-Aware Tokenization Paper • 2406.19320 • Published Jun 27, 2024 • 8
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts Paper • 2510.19363 • Published Oct 22 • 61
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published 8 days ago • 10