THINKSAFE: Self-Generated Safety Alignment for Reasoning Models Paper • 2601.23143 • Published 4 days ago • 38
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published 6 days ago • 14
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 29 days ago • 105
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published 24 days ago • 52
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published Dec 31, 2025 • 42
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published 25 days ago • 43
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 21 days ago • 145
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published 21 days ago • 38
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published Dec 23, 2025 • 85
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Paper • 2601.07372 • Published 22 days ago • 40
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published 27 days ago • 52
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs Paper • 2601.06786 • Published 24 days ago • 6
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published 23 days ago • 24
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published 26 days ago • 82