PCL RoBERTa-Large Ensemble
A 5-fold ensemble of roberta-large fine-tuned for binary Patronizing and Condescending Language (PCL) detection (SemEval 2022 Task 4, Subtask 1).
Model Description
This model detects whether a paragraph contains patronizing or condescending language toward vulnerable communities. It consists of 5 fold models trained via stratified cross-validation, whose predictions are combined using CAWPE-inspired weighted averaging.
Key techniques:
- Focal Loss (alpha=0.85, gamma=2.0) to handle class imbalance
- Keyword prepending: the target community keyword is prepended to the input text
- Threshold optimization: optimal classification threshold (t=0.40) found via post-hoc sweep on CV predictions
- Collapse detection: automatic reinitialization if a fold produces near-constant outputs
Training Details
| Hyperparameter | Value |
|---|---|
| Base model | roberta-large |
| Max sequence length | 512 |
| Learning rate | 1e-5 |
| Batch size | 8 |
| Epochs | 5 |
| Folds | 5 (Stratified K-Fold) |
| Optimizer | AdamW (weight_decay=0.01) |
| Scheduler | Linear with 10% warmup |
| Seed | 123 |
Results
| Metric | Value |
|---|---|
| Dev F1 | 0.6333 |
| Dev Precision | 0.60 |
| Dev Recall | 0.67 |
| Mean CV F1 | 0.5892 |
| Optimal threshold | 0.40 |
Per-fold CV F1
| Fold | F1 |
|---|---|
| 1 | 0.6323 |
| 2 | 0.5539 |
| 3 | 0.5515 |
| 4 | 0.6040 |
| 5 | 0.6045 |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
# Load all 5 fold models
models = []
tokenizer = AutoTokenizer.from_pretrained("noufwithy/pcl-roberta-large-ensemble", subfolder="fold_0")
for fold in range(5):
model = AutoModelForSequenceClassification.from_pretrained(
"noufwithy/pcl-roberta-large-ensemble", subfolder=f"fold_{fold}"
)
model.eval()
models.append(model)
# Prepend keyword to text (as done during training)
keyword = "homeless people"
text = "These poor people just need someone to help them get back on their feet."
input_text = f"{keyword} {text}"
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512)
# Ensemble prediction (weighted average)
weights = [0.6323, 0.5539, 0.5515, 0.6040, 0.6045]
probs = []
for model, w in zip(models, weights):
with torch.no_grad():
logits = model(**inputs).logits
prob = torch.softmax(logits, dim=-1)[0, 1].item()
probs.append(prob * w)
avg_prob = sum(probs) / sum(weights)
prediction = int(avg_prob >= 0.40) # optimal threshold
print(f"PCL probability: {avg_prob:.4f}, Prediction: {'PCL' if prediction else 'Not PCL'}")
Evaluation results
- Dev F1 on SemEval 2022 Task 4 - PCLself-reported0.633