RLCR - a mehuldamani Collection

mehuldamani 's Collections

RLCR

updated Aug 6, 2025

Collection of models and datasets for Beyond Binary Rewards: Training LMs to Reason about their Uncertainty

mehuldamani/big-math-digits-v2-correctness

Text Generation • 8B • Updated Jun 25, 2025 • 19 •
mehuldamani/hotpot-v2-correctness-7b

Text Generation • 8B • Updated Jul 29, 2025 • 2 •
mehuldamani/orm-big-math-digits-v2-correctness

Text Classification • 7B • Updated Jul 8, 2025 • 48
mehuldamani/big-math-digits-v2-brier

8B • Updated Aug 4, 2025 • 47
mehuldamani/big-math-digits

Viewer • Updated Aug 5, 2025 • 31k • 904
mehuldamani/hotpot_qa

Viewer • Updated Aug 5, 2025 • 20.5k • 1.2k
mehuldamani/hotpot-v2-brier-7b-no-split

Text Generation • 8B • Updated Jun 5, 2025 • 50 •
mehuldamani/big-math-digits-v2-brier-base-tabc

Text Generation • 8B • Updated Jun 28, 2025 • 63 •
mehuldamani/orm-hotpot-v2-final-correctness

Text Classification • 7B • Updated Jun 9, 2025 • 15
mehuldamani/qwen-base-verifier-sft-v1

Text Generation • 8B • Updated Jun 13, 2025 • 721 •