Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
SaylorTwift
's Collections
benchmarks
RULER Datasets Falcon-H1-3B-Base
RULER Datasets Lamma3-Instruct
RULER Datasets Qwen2.5-Instruct
RULER Datasets Qwen-3-Instruct
RULER Datasets Qwen-3
agents
Agents ressources
benchmarks
updated
1 day ago
Upvote
-
meituan-longcat/LARYBench
Updated
2 days ago
•
5.11k
•
15
llamaindex/ParseBench
Benchmark
•
Updated
5 days ago
•
169k
•
13.9k
•
69
nvidia/QCalEval
Viewer
•
Updated
10 days ago
•
243
•
666
•
14
allenai/olmOCR-bench
Benchmark
•
Updated
Feb 19
•
3.64k
•
196
LongHorizonReasoning/longcot
Viewer
•
Updated
3 days ago
•
5k
•
444
•
9
mercor/apex-agents
Viewer
•
Updated
Mar 3
•
480
•
52.3k
•
117
hsiung/MagicBench
Viewer
•
Updated
5 days ago
•
50
•
119
•
10
openlifescienceai/medmcqa
Viewer
•
Updated
Jan 4, 2024
•
193k
•
28.7k
•
222
ShadenA/MathNet
Updated
about 7 hours ago
•
5.04k
•
39
Upvote
-
Share collection
View history
Collection guide
Browse collections