-
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization
Paper • 2605.15980 • Published • 35 -
NGRPO: Negative-enhanced Group Relative Policy Optimization
Paper • 2509.18851 • Published • 2 -
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization
Paper • 2605.19436 • Published • 14 -
Delta Attention Residuals
Paper • 2605.18855 • Published • 7
Vansh Kumar
Vansh2676
AI & ML interests
interested in NLP
Recent Activity
updated a collection 3 days ago
Reinforcement learning updated a collection 3 days ago
Reinforcement learning updated a collection 3 days ago
Reinforcement learning Organizations
None yet