BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Paper • 2604.09497 • Published 13 days ago • 29
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems Paper • 2604.14228 • Published 9 days ago • 23
Cross-Tokenizer LLM Distillation through a Byte-Level Interface Paper • 2604.07466 • Published 10 days ago • 5
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data Paper • 2604.14164 • Published Mar 23 • 34
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens Paper • 2603.23516 • Published Mar 6 • 48
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 28 days ago • 30
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 102
Nemotron-Terminal Collection We are releasing Nemotron-Terminal models and training datasets. • 5 items • Updated 3 days ago • 34
Revisiting the Platonic Representation Hypothesis: An Aristotelian View Paper • 2602.14486 • Published Feb 16 • 12
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook Paper • 2602.14299 • Published Feb 15 • 27
Health AI Developer Foundations (HAI-DEF) Collection Groups models released for use in health AI by Google. Read more about HAI-DEF at http://goo.gle/hai-def • 22 items • Updated Mar 12 • 214
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math Paper • 2602.06291 • Published Feb 6 • 24
Revisiting the Shape Convention of Transformer Language Models Paper • 2602.06471 • Published Feb 6 • 4
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published Feb 5 • 52