Fine-Tuned Gemma 2 9B-IT for Automated Essay Grading (LoRA Adapter)
A LoRA adapter fine-tuned on Gemma 2 9B-IT for automated grading of essay questions in undergraduate business education. The model evaluates student responses using a 4-criterion rubric and produces grades on a 0-5 scale.
Model Details
- Base model: google/gemma-2-9b-it (via unsloth/gemma-2-9b-it)
- Fine-tuning method: LoRA (Low-Rank Adaptation)
- Language: English
- License: Apache 2.0
- Developed by: Kamal Abdul-Fattah
Training Details
Dataset
- 2,550 student responses across 85 questions (30 responses per question)
- 4 rubric criteria: Clarity (0-2), Terminology (0-2), Coverage (0-2), Accuracy (0-4)
- Final grade = sum of rubric scores / 2 → 0-5 scale
- Question-level stratified split: 51 train / 17 validation / 17 test questions (60/20/20%)
- Dataset: Zenodo (DOI: 10.5281/zenodo.18856922)
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 80 |
| Alpha | 80 |
| Dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Task type | CAUSAL_LM |
Training Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 5e-6 |
| Epochs | 3 |
| Batch size | 2 (with gradient accumulation 4 → effective 8) |
| Precision | bf16 |
| Optimizer | AdamW (8-bit) |
| Warmup steps | 5 |
| LR scheduler | Linear |
| Hardware | Google Colab (NVIDIA T4 GPU) |
Evaluation Results
Evaluated on a held-out test set of 510 samples (17 questions x 30 responses):
| Model | ±1.0 Accuracy | MAE | QWK | Off-Topic Acc |
|---|---|---|---|---|
| Fine-tuned Gemma 2 9B (this model) | 78.4% | 0.713 | 0.821 | 93.1% |
| Zero-shot Gemma 2 9B | 72.4% | 0.875 | 0.761 | 77.3% |
| Claude Opus 4.6 | 77.1% | 0.749 | 0.813 | 91.8% |
| Claude Sonnet 4 | 77.5% | 0.813 | 0.787 | 84.1% |
| GPT-4o | 71.6% | 0.845 | 0.766 | 88.4% |
| GPT-5.2 | 68.0% | 0.897 | 0.771 | 91.8% |
Metrics
- ±1.0 Accuracy: Percentage of predictions within 1.0 point of the human grade (0-5 scale)
- MAE: Mean Absolute Error
- QWK: Quadratic Weighted Kappa (inter-rater agreement)
- Off-Topic Accuracy: Correct classification of on-topic, off-topic, and no-answer responses
How to Use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it")
tokenizer = AutoTokenizer.from_pretrained("KamalEzzo/gemma2-9b-it-essay-grading-lora")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "KamalEzzo/gemma2-9b-it-essay-grading-lora")
Repository
- Code & notebooks: GitHub
- Dataset: Zenodo (DOI: 10.5281/zenodo.18856922)
Citation
If you use this model, please cite:
@misc{abdulfattah2026essaygrading,
title={Automated Essay Grading with Fine-Tuned Gemma 2 9B-IT},
author={Abdul-Fattah, Kamal},
year={2026},
url={https://huggingface.co/KamalEzzo/gemma2-9b-it-essay-grading-lora}
}
Framework Versions
- PEFT 0.18.1
- Transformers 4.x
- PyTorch 2.x
- Downloads last month
- 67