·
AI & ML interests
NLP, RL
Organizations
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Viewer
• Updated • 12.5k • 24
Viewer
• Updated • 361k • 12
Dahoas/aimo-validation-aime
Viewer
• Updated • 90 • 15
Dahoas/qwen-1.5-4B-default-positives-epoch-1-100
Viewer
• Updated • 290k • 10
Dahoas/qwen-1.5-4B-tree-positives-epoch-2-100
Viewer
• Updated • 491k • 8
Dahoas/qwen-1.5-4B-tree-positives-epoch-1-100
Viewer
• Updated • 477k • 9
Dahoas/qwen-1.5-4B-epoch-1-test-100
Viewer
• Updated • 498k • 6
Dahoas/qwen-1.5-4B-K-100-test
Viewer
• Updated • 500k • 40
Dahoas/MATH_train_K_100_qwen_1.5_4B_outputs
Viewer
• Updated • 750k • 26
Viewer
• Updated • 750k • 19
• 2