ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published Jun 2, 2025
Reliable and Efficient Amortized Model-Based Evaluation Collection Datasets and Models for the REEval project • 24 items • Updated Sep 7, 2025 • 1