SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models Paper • 2511.15605 • Published 19 days ago • 22
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Paper • 2510.00406 • Published Oct 1 • 65