HuangMeow
Luckyyy
AI & ML interests
None yet
Recent Activity
authored
a paper
about 6 hours ago
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training upvoted a paper about 6 hours ago
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training upvoted an article 4 days ago
DenseR: Dense Rewards For Free in LLM Reasoning Organizations
None yet