Add 12 evaluation results to model-index
#19
by
burtenshaw
HF Staff
- opened
This PR adds structured evaluation results to the model-index metadata.
Extracted 12 benchmark results:
- GPQA: 50.3
- SuperGPQA: 32.2
- AIME25: 22.7
- HMMT25: 9.7
- ZebraLogic: 14.8
- LiveBench 20241125: 41.5
- IFEval: 74.5
- Creative Writing v3: 72.7
- WritingBench: 66.9
- MultiIF: 60.7
...
This metadata makes the results discoverable through the Hub and displayable in the model card widget.
Automatically extracted using the evaluation-on-hub skill.