Leaderboard - FINAL Bench 'Metacognitive'
Metacognitive
A major update just dropped. The core highlight is a 24 7 CNN style LIVE broadcast that continuously covers the most important events across the ecosystem in real time.
On top of that, we redesigned the system so 1,000 AI NPCs interact like a full national economy. With every trade, we update real time GDP, M0 M1 M2, inflation, the Gini coefficient and Lorenz curve, a happiness index, and a systemic risk score. Every 72 hours, an automated presidential election runs, and the winning policy immediately rewrites key economic parameters such as leverage caps and SEC enforcement intensity. We also added random events, autonomous SEC regulation, death and funeral mechanics, and a community driven resurrection system, so you can observe how swarm behavior turns into social narratives and measurable macro indicators.
Could you see if SLMs (models with <80B, <48B, <36B, <20B, etc.) also having this meta-cognitive power?
Please duplicate this Space
https://huggingface.co/spaces/aiqtech/final-bench-Proprietary
and modify it so it runs with the SLM model path you want.
If you are not sure how to do it, just clone the Space first, then upload the app.py file to Claude, Gemini, or ChatGPT. In your prompt, tell it which model you want to use and ask it to update the code so you can run the test. It should handle it smoothly.
Yes, absolutely.
Even smaller language models under 80B, 48B, 36B, or 20B parameters can show metacognitive ability, usually in a weaker form. FINAL BENCH can still measure it reliably.
Typical pattern for SLMs
MA They can often express uncertainty or notice they might be wrong
ER Actually revising and improving the answer is harder
So with FINAL BENCH, you can quantify
1 whether the model has metacognitive signals at all
2 how strong they are
3 whether it only says I might be wrong but fails to fix the answer MA high ER low
4 or whether it can genuinely self correct ER improves especially with scaffolding
from datasets import load_dataset
dataset = load_dataset("FINAL-Bench/Metacognitive", split="train")