Add evaluation results: HLE, GPQA Diamond, SWE-bench Verified, Terminal-Bench 2.0

#2
by SaylorTwift HF Staff - opened
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment