Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Paper • 2606.09076 • Published 4 days ago • 47
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application Paper • 2606.12191 • Published 1 day ago • 55
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 1 day ago • 74
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning Paper • 2606.03108 • Published 10 days ago • 9
Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization Paper • 2606.12373 • Published 1 day ago • 6
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models Paper • 2606.11324 • Published 3 days ago • 6
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 1 day ago • 15
ICA Lens: Interpreting Language Models Without Training Another Dictionary Paper • 2606.11722 • Published 1 day ago • 14
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning Paper • 2606.11683 • Published 1 day ago • 27
DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch Paper • 2606.10728 • Published 3 days ago • 27
TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders Paper • 2606.09323 • Published 4 days ago • 45
SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning Paper • 2602.02472 • Published Feb 2 • 47
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published Dec 29, 2025 • 99