view article Article Introducing WM Bench: A Benchmark for Cognitive Intelligence in World Models 1 day ago โข 13
view article Article ๐๏ธ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 21 days ago โข 38
view article Article MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning 22 days ago โข 15
view article Article Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 23 days ago โข 12
view article Article Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism? Feb 24 โข 17
FINAL Bench Collection World's First Functional Metacognition Benchmark. "Not how much AI knows โ but whether it knows what it doesn't know, and can fix it." โข 2 items โข Updated Feb 21 โข 4