Dahoas (Alex Havrilla)

Articles 1

Article

404

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Papers 3

arxiv:2412.02980

arxiv:2403.04642

arxiv:2402.10963

models 33

datasets 148

Dahoas/MATH

Viewer • Updated Jan 29, 2025 • 12.5k • 24

Dahoas/numina-synthetic

Viewer • Updated Dec 23, 2024 • 361k • 12

Dahoas/aimo-validation-aime

Viewer • Updated Dec 11, 2024 • 90 • 15

Dahoas/qwen-1.5-4B-default-positives-epoch-1-100

Viewer • Updated Dec 6, 2024 • 290k • 10

Dahoas/qwen-1.5-4B-tree-positives-epoch-2-100

Viewer • Updated Dec 6, 2024 • 491k • 8

Dahoas/qwen-1.5-4B-tree-positives-epoch-1-100

Viewer • Updated Dec 5, 2024 • 477k • 9

Dahoas/qwen-1.5-4B-epoch-1-test-100

Viewer • Updated Nov 28, 2024 • 498k • 6

Dahoas/qwen-1.5-4B-K-100-test

Viewer • Updated Nov 5, 2024 • 500k • 40

Dahoas/MATH_train_K_100_qwen_1.5_4B_outputs

Viewer • Updated Oct 22, 2024 • 750k • 26

Dahoas/MATH-K-100-train

Viewer • Updated Sep 12, 2024 • 750k • 19 • 2

View 148 datasets

Alex Havrilla

AI & ML interests

Organizations

Articles 1

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Papers 3

models 33

Dahoas/gptj-rm-IHP

Dahoas/gptneox-response-full-static-sft

Dahoas/pythia-1B-response-full-static-sft

Dahoas/pythia-125M-response-full-static-sft

Dahoas/synthetic-pythia-6B-rm-sft-response

Dahoas/pythia-6B-sft-response-full-static

Dahoas/gptj-6B-response-full-static-sft

Dahoas/pythia-6B-rm-response-full-hh

Dahoas/gptj-response-full-sft

Dahoas/pythia-6b-rm-response-only-full-hh

datasets 148

Dahoas/MATH

Dahoas/numina-synthetic

Dahoas/aimo-validation-aime

Dahoas/qwen-1.5-4B-default-positives-epoch-1-100

Dahoas/qwen-1.5-4B-tree-positives-epoch-2-100

Dahoas/qwen-1.5-4B-tree-positives-epoch-1-100

Dahoas/qwen-1.5-4B-epoch-1-test-100

Dahoas/qwen-1.5-4B-K-100-test

Dahoas/MATH_train_K_100_qwen_1.5_4B_outputs

Dahoas/MATH-K-100-train

Alex Havrilla

AI & ML interests

Organizations

Articles 1

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Papers 3

models 33 Sort: Recently updated

datasets 148 Sort: Recently updated

models 33

datasets 148