N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization Paper • 2606.10768 • Published 5 days ago • 22
Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization Paper • 2601.01483 • Published Jan 4 • 1
Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization Paper • 2601.01483 • Published Jan 4 • 1