Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 5 days ago • 261
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published 8 days ago • 51
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 4 days ago • 240
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 11 days ago • 457
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published 10 days ago • 348
A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning Paper • 2604.03995 • Published 8 days ago • 4
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation Paper • 2603.28068 • Published 13 days ago • 13
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published 13 days ago • 56
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math Paper • 2603.24961 • Published 18 days ago • 4