RLCR - a mehuldamani Collection

mehuldamani 's Collections

RLCR

RLCR

updated Aug 6, 2025

Collection of models and datasets for Beyond Binary Rewards: Training LMs to Reason about their Uncertainty

mehuldamani/big-math-digits-v2-correctness

Text Generation • 8B • Updated Jun 25, 2025 • 6
mehuldamani/hotpot-v2-correctness-7b

Text Generation • 8B • Updated Jul 29, 2025 • 32
mehuldamani/orm-big-math-digits-v2-correctness

Text Classification • 7B • Updated Jul 8, 2025 • 8
mehuldamani/big-math-digits-v2-brier

8B • Updated Aug 4, 2025 • 49
mehuldamani/big-math-digits

Viewer • Updated Aug 5, 2025 • 31k • 67
mehuldamani/hotpot_qa

Viewer • Updated Aug 5, 2025 • 20.5k • 172
mehuldamani/hotpot-v2-brier-7b-no-split

Text Generation • 8B • Updated Jun 5, 2025 • 32
mehuldamani/big-math-digits-v2-brier-base-tabc

Text Generation • 8B • Updated Jun 28, 2025 • 7
mehuldamani/orm-hotpot-v2-final-correctness

Text Classification • 7B • Updated Jun 9, 2025 • 5
mehuldamani/qwen-base-verifier-sft-v1

Text Generation • 8B • Updated Jun 13, 2025 • 62