Reinforcement Learning with Promising Tokens for Large Language Models Paper • 2602.03195 • Published Feb 3 • 1
Language Model Self-improvement by Reinforcement Learning Contemplation Paper • 2305.14483 • Published May 23, 2023 • 1