How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs Paper • 2501.10711 • Published Jan 18, 2025
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench Paper • 2506.09289 • Published Jun 10, 2025 • 2