Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 5 days ago • 75
Go-Explore: a New Approach for Hard-Exploration Problems Paper • 1901.10995 • Published Jan 30, 2019 • 1
KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2, 2024 • 20
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 272
SLiC-HF: Sequence Likelihood Calibration with Human Feedback Paper • 2305.10425 • Published May 17, 2023 • 6
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 49
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 180
panda-gym: Open-source goal-conditioned environments for robotic learning Paper • 2106.13687 • Published Jun 25, 2021 • 3