AI & ML interests
NLU
Recent Activity
Agreemind
AI-powered contract risk analysis across multiple legal domains.
We build specialized NLP models and an end-to-end analysis pipeline that automatically detects risky clauses in legal contracts — Terms of Service, Banking Agreements, NDAs, and general contracts.
🏗️ Architecture
Agreemind uses a multi-engine routing architecture: an intelligent router auto-classifies each document, then dispatches it to the best specialized engine:
| Engine | Documents | Approach | Model |
|---|---|---|---|
| ToS Engine | Terms of Service | Multi-label classification | lexglue-roberta-unfair-tos |
| English Banking Engine | US/UK bank agreements | Multi-label classification + post-processing | en-banking-roberta |
| Turkish Banking Engine | Turkish bank contracts | Multi-label classification | banking-bert-turkish |
| NDA Engine | Non-Disclosure Agreements | Natural Language Inference (NLI) | contractnli-distilbert-nda |
📊 Models
Terms of Service (LexGLUE UNFAIR-ToS)
Fine-tuned on the LexGLUE UNFAIR-ToS benchmark. Evaluated on the official test set (1,607 samples).
| Model | μ-F1 | m-F1 | Best for |
|---|---|---|---|
| lexglue-roberta-unfair-tos | 96.1 | 84.4 | 🥇 Production — best accuracy |
| lexglue-legalbert-unfair-tos | 96.0 | 84.1 | 🥈 Legal domain |
| lexglue-deberta-unfair-tos | 95.6 | 82.2 | General purpose |
| lexglue-legalbert-small-unfair-tos | 95.0 | 78.5 | ⚡ Fast inference |
LexGLUE Leaderboard: Legal-BERT (paper) = 96.0 μ-F1 / 83.0 m-F1. Our top models match or exceed this.
English Banking
Fine-tuned RoBERTa on 90 labeled US/UK consumer banking contracts (~4,337 clauses). Detects 9 risk categories.
| Model | μ-F1 | Labels | Data |
|---|---|---|---|
| en-banking-roberta | 79.6 | 9 risk categories | 90 contracts, 4.3k clauses |
Risk categories: Hidden Fees, Unilateral Rate/Terms Changes, Overdraft Penalties, Auto-enrollment, Data Sharing, Dispute Limitations, Account Freeze/Closure, Rewards Restrictions.
Turkish Banking
Fine-tuned BERT on manually labeled Turkish bank contracts.
| Model | Architecture | Language |
|---|---|---|
| banking-bert-turkish | bert-base-turkish-cased | Turkish |
NDA (Contract NLI)
NLI-based models trained on ContractNLI for 17 standard NDA provisions (3-class: Entailment / Contradiction / Not Mentioned).
| Model | Architecture | Best for |
|---|---|---|
| contractnli-distilbert-nda | DistilBERT | ⚡ Production — fast |
| contractnli-legalbert-nda-weighted | Legal-BERT | Best accuracy |
| contractnli-legalbert-nda-standard | Legal-BERT | Standard loss |
| contractnli-bert-nda-weighted | BERT | Weighted loss |
| contractnli-bert-nda-standard | BERT | Standard loss |
🚀 Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Terms of Service analysis
model_id = "Agreemind/lexglue-roberta-unfair-tos"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
text = "We may terminate your account at any time without notice."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
probs = torch.sigmoid(model(**inputs).logits).squeeze()
labels = ["Limitation of liability", "Unilateral termination", "Unilateral change",
"Content removal", "Contract by using", "Choice of law", "Jurisdiction", "Arbitration"]
for label, prob in sorted(zip(labels, probs), key=lambda x: x[1], reverse=True):
if prob > 0.5:
print(f" {label}: {prob:.3f}")
📎 Links
- LexGLUE Paper — ToS benchmark
- ContractNLI — NDA benchmark
- LexGLUE Dataset