Models

520

Full-text search

Active filters: rlhf

princeuser/llama-3.2-3b-sre-agent

Reinforcement Learning • Updated 5 days ago

AbdoSaad24/deepseek-coder-6.7b-security-dpo

Text Generation • 7B • Updated 4 days ago • 11

Julia569922/qwen2.5-0.5b-rlhf-sft

Updated 2 days ago • 22

Julia569922/qwen2.5-0.5b-rlhf-rm

Updated 2 days ago • 21

Julia569922/qwen2.5-0.5b-rlhf-ppo

Updated 2 days ago • 20

Julia569922/qwen2.5-0.5b-rlhf-dpo

Updated 2 days ago • 11

OhhMoo/sae-rl-qwen05b-strict

Updated 1 day ago

HumorR1/rm-qwen25vl-3b-20k

Updated about 21 hours ago

HumorR1/policy-qwen3vl-2b-grpo-newyorker

Updated about 8 hours ago

HumorR1/rm-qwen25vl-3b-nodesc

Updated about 5 hours ago