Add 12 evaluation results to model-index

#19
by burtenshaw HF Staff - opened

This PR adds structured evaluation results to the model-index metadata.

Extracted 12 benchmark results:

  • GPQA: 50.3
  • SuperGPQA: 32.2
  • AIME25: 22.7
  • HMMT25: 9.7
  • ZebraLogic: 14.8
  • LiveBench 20241125: 41.5
  • IFEval: 74.5
  • Creative Writing v3: 72.7
  • WritingBench: 66.9
  • MultiIF: 60.7
    ...

This metadata makes the results discoverable through the Hub and displayable in the model card widget.

Automatically extracted using the evaluation-on-hub skill.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment