Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training Paper • 2607.01232 • Published 3 days ago • 1
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers Paper • 2604.07822 • Published Apr 9 • 2