view article Article Why Did MiniMax M2 End Up as a Full Attention Model? MiniMax-AI • Oct 30, 2025 • 80
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints Paper • 2510.08565 • Published Oct 9, 2025 • 21
Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation Paper • 2506.03621 • Published Jun 4, 2025 • 22
DataDecide Collection A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale. • 354 items • Updated Mar 2 • 26
view article Article Open-source DeepResearch – Freeing our search agents +3 m-ric, albertvillanova, merve, thomwolf, clefourrier • Feb 4, 2025 • 1.32k
view article Article How NuminaMath Won the 1st AIMO Progress Prize +6 yfleureau, liyongsea, edbeeching, lewtun, benlipkin, romansoletskyi, vwxyzjn, kashif • Jul 11, 2024 • 128
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 456
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 kashif, edbeeching, lewtun, lvwerra, osanseviero • Jan 18, 2024 • 83
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889