-
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Paper • 2508.07629 • Published • 42 -
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
Paper • 2508.07101 • Published • 13 -
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper • 2508.03346 • Published • 7 -
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Paper • 2508.08940 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2509.08827
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 139
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 660 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 345 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 239 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 239 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 193
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 132 -
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Paper • 2510.23607 • Published • 174 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 125
-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 6 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189
-
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Paper • 2508.07629 • Published • 42 -
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
Paper • 2508.07101 • Published • 13 -
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper • 2508.03346 • Published • 7 -
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Paper • 2508.08940 • Published • 27
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 132 -
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Paper • 2510.23607 • Published • 174 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 125
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 139
-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 6 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 660 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 345 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 239 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 239 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 193