MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published Jan 18 • 49
Olmo 3.1 Collection The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated Dec 23, 2025 • 48
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27, 2025 • 54
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 127
Reinforcement Learning Teachers Collection Students distilled from a 7B Reinforcement-Learned Teacher (RLT) from the paper "Reinforcement Learning Teachers of Test Time Scaling." • 2 items • Updated Jun 22, 2025 • 9
MiniMax-M1 Collection MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. • 6 items • Updated 27 days ago • 118
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 303
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated 10 days ago • 96
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8, 2025 • 288
view article Article Welcome Falcon Mamba: The first strong attention-free 7B model +4 Aug 12, 2024 • 113
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12, 2024 • 139