AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 596 items • Updated about 13 hours ago • 82
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context Paper • 2602.12108 • Published 5 days ago • 13
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 176
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 13 days ago • 69
NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition Paper • 2507.18130 • Published Jul 24, 2025 • 1
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch Paper • 2601.13606 • Published 28 days ago • 11
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods Paper • 2601.21821 • Published 19 days ago • 59
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39
Kimi-K2 Collection Moonshot's MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence • 5 items • Updated 21 days ago • 172
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Paper • 2601.20209 • Published 20 days ago • 22
AgentDoG Collection A Diagnostic Guardrail Framework for AI Agent Safety and Security • 11 items • Updated 20 days ago • 103