--- license: llama3.2 base_model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit tags: - llama - dpo - preference-alignment - fine-tuned - unsloth - lora - nlp - deep-learning - gordon-ramsay - text-generation datasets: - antonisbast/gordon-ramsay-dl-instruct language: - en pipeline_tag: text-generation model-index: - name: Llama-3.2-3B-Gordon-Ramsay-DPO results: - task: type: text-generation name: Style Transfer (Gordon Ramsay) metrics: - name: LLM-as-a-Judge (1-5 scale) type: custom value: 3.90 --- # Llama-3.2-3B-Gordon-Ramsay-DPO A Llama 3.2 3B Instruct model fine-tuned with **Direct Preference Optimization (DPO)** to answer Deep Learning questions in the style of Gordon Ramsay — complete with cooking metaphors, brutal honesty, and technically accurate explanations. ## Model Description This model was trained as part of the MSc in Artificial Intelligence & Deep Learning (AIDL_B_CS01 — NLP with Deep Learning) at the University of West Attica. The goal was to align a small language model to consistently adopt a specific persona (Gordon Ramsay as an AI/DL tutor) using preference-based training rather than supervised fine-tuning. **What it does:** Given a Deep Learning question, the model responds with a technically correct answer delivered in Gordon Ramsay's signature style — angry, impatient, loaded with cooking analogies, and surprisingly educational. ## Training Details | Parameter | Value | |---|---| | **Base Model** | `unsloth/Llama-3.2-3B-Instruct-bnb-4bit` | | **Method** | DPO (Direct Preference Optimization) | | **Framework** | Unsloth + TRL (HuggingFace) | | **Quantization** | 4-bit (bnb) | | **LoRA Rank (r)** | 64 | | **LoRA Alpha** | 64 | | **Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | **Trainable Parameters** | 97.3M / 3.31B (2.94%) | | **Learning Rate** | 5e-6 | | **Epochs** | 3 | | **Batch Size** | 2 (x4 gradient accumulation = effective 8) | | **Optimizer** | AdamW 8-bit | | **LR Scheduler** | Linear with 0.1 warmup ratio | | **DPO Beta** | 0.1 | | **Max Sequence Length** | 1024 | | **Total Training Steps** | 189 | | **Final Training Loss** | 0.1261 | | **Hardware** | 1x NVIDIA Tesla T4 (Google Colab) | ### Dataset - **Training:** 500 examples from 5 contributors — each example contains a DL question, a polite answer (rejected), and a Gordon Ramsay-style answer (chosen) - **Evaluation:** 100 held-out examples (separate contributor) - **Dataset:** [`antonisbast/gordon-ramsay-dl-instruct`](https://huggingface.co/datasets/antonisbast/gordon-ramsay-dl-instruct) ### DPO Format ``` chosen: Gordon Ramsay-style answer (cooking metaphors, aggressive, correct) rejected: Polite, standard educational answer ``` The model learns to prefer the Ramsay-style responses over polite ones while preserving factual accuracy. ## Training Metrics | Metric | Start | End | |---|---|---| | Training Loss | 0.688 | 0.126 | | Reward Accuracy | 68.8% | 100% | | Reward Margin | 0.01 | 4.18 | ## Evaluation Evaluation was performed using **LLM-as-a-Judge** — the base Llama 3.2 model (with LoRA adapters disabled) scored each generated response against the ground-truth Ramsay reference on a 1–5 scale evaluating style fidelity, content accuracy, and cooking metaphor usage. | Metric | Score | |---|---| | **Average LLM Judge Score** | **3.90 / 5.0** | ## Usage ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO", max_seq_length=2048, load_in_4bit=True, ) FastLanguageModel.for_inference(model) prompt = """You are Gordon Ramsay. Answer this deep learning question in your signature style: - Be concise and clear (max 3 sentences) - Use cooking metaphors - Be brutally honest and toxic - Do NOT use emojis - Explain the concept correctly Question: What is dropout? Gordon Ramsay:""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, repetition_penalty=1.1, do_sample=True, ) generated = outputs[0][inputs.input_ids.shape[-1]:] print(tokenizer.decode(generated, skip_special_tokens=True)) ``` ### Example Outputs **Q: Why do we use ReLU instead of sigmoid?** > "This is a bloody no-brainer! You can't have too much sauce, but you don't need all that cream either. ReLU keeps your gradients flowing like a well-run kitchen line, while sigmoid flattens everything like overcooked pasta!" **Q: What is backpropagation?** > "Backpropagation is like trying to figure out what went wrong with your bloody dish. You serve it up, but it's a disaster. So, you start by going back, analyzing every step, and asking yourself, 'What was I doing wrong?'" **Q: What is dropout?** > "Dropout? Are you bloody kidding me? It's like a kitchen tool, but instead of chopping vegetables, it's chopping your network's brain cells! You're telling it to shut down a portion of its neurons at random, like a chef taking a break from cooking." ## Additional Use: RAG Pipeline This model was also integrated into a custom **Retrieval-Augmented Generation (RAG)** pipeline where it serves dual roles: 1. **Query paraphrasing** — generates alternative formulations of user questions to improve retrieval recall 2. **Answer generation** — produces Gordon Ramsay-style answers grounded in retrieved PDF chunks ## Limitations - The model is fine-tuned for **entertainment and educational purposes** within the domain of Deep Learning concepts - Responses may occasionally lose the Ramsay persona for questions outside the training distribution - The aggressive tone is purely stylistic — the model was not trained to produce harmful content - As a 3B parameter model with 4-bit quantization, complex multi-step reasoning may be limited - Training data was limited to 500 examples, so coverage of DL topics is not exhaustive ## Citation ```bibtex @misc{bastoulis2025gordonramsaydpo, title={Llama-3.2-3B-Gordon-Ramsay-DPO: DPO-aligned LLM for Gordon Ramsay-style Deep Learning tutoring}, author={Antonis Bastoulis}, year={2025}, url={https://huggingface.co/antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO} } ``` ## Acknowledgments - **Course:** AIDL_B_CS01 — Natural Language Processing with Deep Learning, University of West Attica - **Instructor:** Panagiotis Kasnesis - **Base Model:** [Meta Llama 3.2](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) under the Llama 3.2 Community License - **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) for efficient LoRA fine-tuning--- base_model: unsloth/llama-3.2-3b-instruct-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en --- # Uploaded model - **Developed by:** antonisbast - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3.2-3b-instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) [](https://github.com/unslothai/unsloth)