Submitted by
Tianyu Pang
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning