-
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56 -
The Majority is not always right: RL training for solution aggregation
Paper • 2509.06870 • Published • 16 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper • 2509.03646 • Published • 32
kongqi
kongqi
·
AI & ML interests
None yet
Organizations
None yet
Llm
-
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56 -
The Majority is not always right: RL training for solution aggregation
Paper • 2509.06870 • Published • 16 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper • 2509.03646 • Published • 32
timeseries
models
0
None public yet
datasets
0
None public yet