CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
•
Updated
•
1.54k
•
1.07k
•
18
Updated
•
69
•
6
rootsautomation/RICO-ScreenQA
Viewer
•
Updated
•
86k
•
175
•
11
rootsautomation/ScreenSpot
Viewer
•
Updated
•
1.27k
•
1.05k
•
44
Viewer
•
Updated
•
1.27k
•
905
•
8
Viewer
•
Updated
•
1.59k
•
2.19k
•
44
Preview
•
Updated
•
1.63k
•
15
Preview
•
Updated
•
4.03k
•
25
Viewer
•
Updated
•
168k
•
175
•
5
Preview
•
Updated
•
12
osunlp/Multimodal-Mind2Web
Viewer
•
Updated
•
14.2k
•
2.89k
•
91
Viewer
•
Updated
•
259
•
72
•
2
Viewer
•
Updated
•
253
•
3.08k
•
123
Viewer
•
Updated
•
7.74k
•
5.6k
•
26
xlangai/ubuntu_osworld_file_cache
Updated
•
368k
•
3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
•
2409.08264
•
Published
•
48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
•
2405.14573
•
Published
Viewer
•
Updated
•
1.21k
•
49
•
5