view article Article Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence” Aug 11, 2025 • 6
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 17 days ago • 206
Pre-training Distillation for Large Language Models: A Design Space Exploration Paper • 2410.16215 • Published Oct 21, 2024 • 17
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 190
DeepPrune Collection Parallel Scaling without Inter-trace Redundancy • 3 items • Updated Oct 10, 2025 • 2
DeepPrune: Parallel Scaling without Inter-trace Redundancy Paper • 2510.08483 • Published Oct 9, 2025 • 24