view article Article ๐๏ธ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 1 day ago โข 31
view article Article MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning 3 days ago โข 13
view article Article Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 4 days ago โข 11
FINAL Bench Collection World's First Functional Metacognition Benchmark. "Not how much AI knows โ but whether it knows what it doesn't know, and can fix it." โข 2 items โข Updated 19 days ago โข 4
view article Article Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism? 16 days ago โข 17