Generate code instantly from natural language prompts
Tracks perf of LLMs, VLMs and agents on web navigation tasks