MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering Paper • 2601.22859 • Published 17 days ago • 17
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering Paper • 2601.22859 • Published 17 days ago • 17
Assessing Demographic Bias in Named Entity Recognition Paper • 2008.03415 • Published Aug 8, 2020
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models Paper • 2509.24239 • Published Sep 29, 2025 • 4