Model Card โ€” Fine-tuned GPT-2 on Mental Health & Psychology Datasets (45K rows, 10 Epochs)

Model Description

This model is a fine-tuned version of GPT-2 on a combined dataset of ~45,000 mental health and psychology conversation samples across 6 datasets. It is a causal language model trained to generate empathetic, contextually appropriate responses to mental health-related prompts โ€” making it suitable for counseling conversation research, mental health chatbot prototyping, and psychology NLP tasks.

  • Developed by: praniil
  • Model type: Causal Language Model (GPT-2)
  • Language(s): English
  • License: MIT
  • Finetuned from model: gpt2 (OpenAI GPT-2 124M, via Hugging Face)

Model Sources


Uses

Direct Use

This model can be used out-of-the-box for mental health and psychology text generation โ€” given a user message or question as a prompt, it generates a response in the style of a counseling conversation.

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

model_name = "Pranilllllll/finetuned_gpt2_45krows_10epochs"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
model.eval()

prompt = "I have been feeling very anxious and overwhelmed lately."
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        do_sample=True,
        temperature=0.9,
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downstream Use

This model can be plugged into larger pipelines for:

  • Mental health chatbot or virtual counselor prototyping
  • Generating synthetic counseling conversation data
  • Psychology NLP research and benchmarking
  • Empathetic response generation systems

Out-of-Scope Use

  • Not a substitute for professional mental health care. This model should never be used as a replacement for licensed therapists or clinical diagnosis.
  • Not suitable for crisis intervention or emergency mental health situations.
  • Not designed for factual question answering or knowledge retrieval tasks.
  • Should not be deployed in production-facing mental health applications without thorough safety evaluation.

Bias, Risks, and Limitations

  • Clinical risk: The model may generate responses that sound plausible but are clinically incorrect, harmful, or inappropriate for vulnerable users. Always include human oversight.
  • Data bias: The model reflects patterns and biases present across the 6 source datasets. Some datasets may over-represent specific demographics or therapeutic styles.
  • Hallucination: GPT-2 based models may generate fluent but factually incorrect or contextually inappropriate text.
  • Short context window: Sequences were truncated to 128 tokens during training, so very long conversations may lose context.
  • Small model size: At 124M parameters, GPT-2 has limited capacity for nuanced reasoning compared to larger modern LLMs.

Recommendations

This model is intended for research and prototyping only. It should not be deployed in any real-world mental health support context without rigorous safety evaluation, content filtering, and human-in-the-loop oversight.


How to Get Started with the Model

Install dependencies:

pip install transformers torch

Then use the inference script in the Direct Use section above.


Training Details

Training Data

The model was trained on a combined dataset of ~45,000 rows sourced from 6 public mental health and psychology datasets on Hugging Face:

# Dataset Description
1 marmikpandya/mental-health Mental health Q&A pairs
2 fadodr/mental_health_therapy Therapy conversation pairs
3 Amod/mental_health_counseling_conversations Counseling context-response pairs
4 jkhedri/psychology-dataset Psychology Q&A pairs
5 samhog/psychology-6k Psychology input-output pairs
6 RAJJ18/mental_health_dataset Mental health conversations (3,000 rows sampled)

All datasets were standardized to a unified input / output column format before concatenation. Dataset 6 was randomly sampled to 3,000 rows (seed=42) for balance.

Training Procedure

Preprocessing

  • All datasets normalized to input and output columns
  • Input and output concatenated as a single string: "{input} {output}"
  • Tokenized using the GPT-2 BPE tokenizer (AutoTokenizer from gpt2)
  • pad_token set to eos_token
  • Sequences truncated and padded to max length of 128 tokens
  • Labels set equal to input_ids for causal language modelling (next-token prediction)

Training Hyperparameters

Hyperparameter Value
Base model gpt2 (124M parameters)
Epochs 10
Training rows ~45,000
Per-device train batch size 4
Per-device eval batch size 4
Learning rate 3e-5
Warmup steps 100
Weight decay 0.01
Max sequence length 128 tokens
Training regime fp16 mixed precision
Evaluation strategy Every 5,000 steps
Save strategy Every 5,000 steps
Logging steps Every 50 steps
Best model metric Validation loss (lower is better)
Checkpoints kept 2 (save_total_limit=2)
Optimizer AdamW (Hugging Face default)

Evaluation Dataset

The test split of fadodr/mental_health_therapy (dataset 2) was used as the held-out validation set during training.


Evaluation

Testing Data, Factors & Metrics

Testing Data

The test split of fadodr/mental_health_therapy โ€” held out from training and used for validation loss tracking.

Metrics

  • Training Loss: Tracked every 50 steps via TensorBoard logging
  • Validation Loss: Evaluated every 5,000 steps; best model checkpoint selected based on lowest validation loss
  • Perplexity: Derived from validation loss โ€” lower perplexity indicates better language modelling

Results

Training and validation loss curves are available in the new_graph/ directory. Full training logs are stored in new_logs/.


Technical Specifications

Model Architecture and Objective

  • Architecture: GPT-2 (decoder-only transformer)
  • Objective: Causal Language Modelling (next-token prediction)
  • Parameters: 124M
  • Layers: 12 transformer blocks
  • Attention heads: 12
  • Hidden size: 768
  • Max context length: 1024 tokens (128 tokens used during training)
  • Tokenizer: GPT-2 BPE tokenizer (vocab size: 50,257)

Compute Infrastructure

Hardware

  • CUDA-enabled GPU (local machine)

Software

  • Python 3.8+
  • PyTorch
  • Hugging Face transformers
  • Hugging Face datasets
  • TensorBoard (for logging)

Environmental Impact

  • Hardware Type: CUDA-enabled GPU
  • Cloud Provider: Not applicable (local training)
  • Compute Region: Nepal
  • Carbon Emitted: Not measured

Citation

@misc{praniil2024finetuned-gpt2-mentalhealth-10epochs,
  author       = {praniil},
  title        = {Fine-tuned GPT-2 on Mental Health and Psychology Datasets (45K rows, 10 Epochs)},
  year         = {2024},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Pranilllllll/finetuned_gpt2_45krows_10epochs}},
}

Model Card Authors

praniil

Model Card Contact

Open an issue at https://github.com/praniil/finetuned_gpt2_45krows_n5/issues

Downloads last month
426
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support