DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 440 • 9
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling Paper • 2601.22636 • Published 16 days ago • 21 • 3