view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +6 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego • 22 days ago • 41
view article Article Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models nvidia • 26 days ago • 34
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 271
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 160
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 169