Running Agents 37 BigCodeArena 🚀 37 Compare two AI models by sending them code and seeing their responses
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39