-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 114 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 680 • 155 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 17
Malkesh Dalia
malkesh2911
·
AI & ML interests
None yet
Recent Activity
upvoted
an
article
about 11 hours ago
We Got Claude to Fine-Tune an Open Source LLM
liked
a model
about 17 hours ago
ByteDance/BindWeave
updated
a collection
13 days ago
My AI
Organizations
None yet