EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents Paper • 2605.13941 • Published 4 days ago • 21
G-Zero: Self-Play for Open-Ended Generation from Zero Data Paper • 2605.09959 • Published 6 days ago • 16
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification Paper • 2605.09269 • Published 7 days ago • 5
Reinforcing Multimodal Reasoning Against Visual Degradation Paper • 2605.09262 • Published 7 days ago • 6
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 9 days ago • 64
On Time, Within Budget: Constraint-Driven Online Resource Allocation for Agentic Workflows Paper • 2605.06110 • Published 10 days ago • 16
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration Paper • 2605.05566 • Published 10 days ago • 36
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 139
OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration Paper • 2602.08344 • Published Feb 9 • 5
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning Paper • 2602.08234 • Published Feb 9 • 76
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 79
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published Feb 3 • 32
Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation Paper • 2602.03619 • Published Feb 3 • 28
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published Feb 3 • 27
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published Jan 9 • 86
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper • 2512.07461 • Published Dec 8, 2025 • 79