re paper - a that113 Collection

that113 's Collections

d

re paper

updated 9 days ago

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 159
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18, 2025 • 23
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 14
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers

Paper • 2507.20527 • Published Jul 28, 2025 • 6
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Paper • 2507.16806 • Published Jul 22, 2025 • 6
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29, 2025 • 8
Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28, 2025 • 31
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28, 2025 • 82
L0: Reinforcement Learning to Become General Agents

Paper • 2506.23667 • Published Jun 30, 2025
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Paper • 2507.15778 • Published Jul 21, 2025 • 20
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published Jun 24, 2025 • 15
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

Paper • 2506.04185 • Published Jun 4, 2025
TreeRPO: Tree Relative Policy Optimization

Paper • 2506.05183 • Published Jun 5, 2025
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

Paper • 2506.11902 • Published Jun 13, 2025
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction

Paper • 2410.12934 • Published Oct 16, 2024 • 1
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 85
tencent/WeDLM-8B-Instruct

Text Generation • 8B • Updated 6 days ago • 2.18k • 282