Llama-3.2-3B-Gordon-Ramsay-DPO

A Llama 3.2 3B Instruct model fine-tuned with Direct Preference Optimization (DPO) to answer Deep Learning questions in the style of Gordon Ramsay — complete with cooking metaphors, brutal honesty, and technically accurate explanations.

Model Description

This model was trained as part of the MSc in Artificial Intelligence & Deep Learning (AIDL_B_CS01 — NLP with Deep Learning) at the University of West Attica. The goal was to align a small language model to consistently adopt a specific persona (Gordon Ramsay as an AI/DL tutor) using preference-based training rather than supervised fine-tuning.

What it does: Given a Deep Learning question, the model responds with a technically correct answer delivered in Gordon Ramsay's signature style — angry, impatient, loaded with cooking analogies, and surprisingly educational.

Training Details

Parameter Value
Base Model unsloth/Llama-3.2-3B-Instruct-bnb-4bit
Method DPO (Direct Preference Optimization)
Framework Unsloth + TRL (HuggingFace)
Quantization 4-bit (bnb)
LoRA Rank (r) 64
LoRA Alpha 64
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters 97.3M / 3.31B (2.94%)
Learning Rate 5e-6
Epochs 3
Batch Size 2 (x4 gradient accumulation = effective 8)
Optimizer AdamW 8-bit
LR Scheduler Linear with 0.1 warmup ratio
DPO Beta 0.1
Max Sequence Length 1024
Total Training Steps 189
Final Training Loss 0.1261
Hardware 1x NVIDIA Tesla T4 (Google Colab)

Dataset

  • Training: 500 examples from 5 contributors — each example contains a DL question, a polite answer (rejected), and a Gordon Ramsay-style answer (chosen)
  • Evaluation: 100 held-out examples (separate contributor)
  • Dataset: antonisbast/gordon-ramsay-dl-instruct

DPO Format

chosen:  Gordon Ramsay-style answer (cooking metaphors, aggressive, correct)
rejected: Polite, standard educational answer

The model learns to prefer the Ramsay-style responses over polite ones while preserving factual accuracy.

Training Metrics

Metric Start End
Training Loss 0.688 0.126
Reward Accuracy 68.8% 100%
Reward Margin 0.01 4.18

Evaluation

Evaluation was performed using LLM-as-a-Judge — the base Llama 3.2 model (with LoRA adapters disabled) scored each generated response against the ground-truth Ramsay reference on a 1–5 scale evaluating style fidelity, content accuracy, and cooking metaphor usage.

Metric Score
Average LLM Judge Score 3.90 / 5.0

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO",
    max_seq_length=2048,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

prompt = """You are Gordon Ramsay. Answer this deep learning question in your signature style:
- Be concise and clear (max 3 sentences)
- Use cooking metaphors
- Be brutally honest and toxic
- Do NOT use emojis
- Explain the concept correctly

Question: What is dropout?

Gordon Ramsay:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True,
)
generated = outputs[0][inputs.input_ids.shape[-1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Example Outputs

Q: Why do we use ReLU instead of sigmoid?

"This is a bloody no-brainer! You can't have too much sauce, but you don't need all that cream either. ReLU keeps your gradients flowing like a well-run kitchen line, while sigmoid flattens everything like overcooked pasta!"

Q: What is backpropagation?

"Backpropagation is like trying to figure out what went wrong with your bloody dish. You serve it up, but it's a disaster. So, you start by going back, analyzing every step, and asking yourself, 'What was I doing wrong?'"

Q: What is dropout?

"Dropout? Are you bloody kidding me? It's like a kitchen tool, but instead of chopping vegetables, it's chopping your network's brain cells! You're telling it to shut down a portion of its neurons at random, like a chef taking a break from cooking."

Additional Use: RAG Pipeline

This model was also integrated into a custom Retrieval-Augmented Generation (RAG) pipeline where it serves dual roles:

  1. Query paraphrasing — generates alternative formulations of user questions to improve retrieval recall
  2. Answer generation — produces Gordon Ramsay-style answers grounded in retrieved PDF chunks

Limitations

  • The model is fine-tuned for entertainment and educational purposes within the domain of Deep Learning concepts
  • Responses may occasionally lose the Ramsay persona for questions outside the training distribution
  • The aggressive tone is purely stylistic — the model was not trained to produce harmful content
  • As a 3B parameter model with 4-bit quantization, complex multi-step reasoning may be limited
  • Training data was limited to 500 examples, so coverage of DL topics is not exhaustive

Citation

@misc{bastoulis2025gordonramsaydpo,
  title={Llama-3.2-3B-Gordon-Ramsay-DPO: DPO-aligned LLM for Gordon Ramsay-style Deep Learning tutoring},
  author={Antonis Bastoulis},
  year={2025},
  url={https://huggingface.co/antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO}
}

Acknowledgments

  • Course: AIDL_B_CS01 — Natural Language Processing with Deep Learning, University of West Attica
  • Instructor: Panagiotis Kasnesis
  • Base Model: Meta Llama 3.2 under the Llama 3.2 Community License
  • Training Framework: Unsloth for efficient LoRA fine-tuning---

base_model: unsloth/llama-3.2-3b-instruct-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en

Uploaded model

  • Developed by: antonisbast
  • License: apache-2.0
  • Finetuned from model : unsloth/llama-3.2-3b-instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO

Adapter
(29)
this model
Quantizations
1 model

Dataset used to train antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO

Evaluation results