Tencent-Hunyuan-Multimodal-RL/FLUX2-klein-base-9b-GenEval2-Multi-Reward
Text-to-Image • Updated • 4
None defined yet.
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models