SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 24 days ago • 30
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 29 days ago • 125
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 29 days ago • 125
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 29 days ago • 123
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 29 days ago • 123
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 29 days ago • 127
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 29 days ago • 127
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated Mar 17 • 81
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated Mar 17 • 81
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated Mar 17 • 80
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated Mar 17 • 80
nick11roberts/co-emerge-overtrained-rw-params22M_maxstep5779-flop_4_00e17_step_5779 37M • Updated Mar 17 • 78