AlexCuadron/SWE-Bench-Verified-O1-native-tool-calling-reasoning-high-results Viewer • Updated Jan 14, 2025 • 500 • 234 • 3
Running on CPU Upgrade 13.9k Open LLM Leaderboard 🏆 13.9k Track, rank and evaluate open LLMs and chatbots