-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • 141B • Updated • 221 • 269 -
alvarobartt/mistral-orpo-mix
Text Generation • 7B • Updated • 3 • 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • 7B • Updated • 13 • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2403.07691
-
Iterative Reasoning Preference Optimization
Paper • 2404.19733 • Published • 49 -
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 81 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 116
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 21 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 12 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • 141B • Updated • 221 • 269 -
alvarobartt/mistral-orpo-mix
Text Generation • 7B • Updated • 3 • 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • 7B • Updated • 13 • 14
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 21 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 12 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
Iterative Reasoning Preference Optimization
Paper • 2404.19733 • Published • 49 -
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 81 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 116
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124