A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization Paper • 2606.16154 • Published 6 days ago • 8