xiaotong
xtongji
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
1 day ago
Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
authored
a paper
8 days ago
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving
authored
a paper
8 days ago
Rethinking Large Language Model Distillation: A Constrained Markov
Decision Process Perspective
Organizations
None yet