Upgrade to flan-t5-base (248M): difficulty_accuracy=94.8%, eval_loss=2.228

1673778 verified 4 months ago

13.8 kB

	---
	license: mit
	language:
	- en
	tags:
	- text2text-generation
	- flan-t5
	- bible
	- simplification
	- readability
	- difficulty-scoring
	- multi-task
	- seq2seq
	datasets:
	- LoveJesus/passage-difficulty-simplifier-dataset-chirho
	pipeline_tag: text2text-generation
	base_model: google/flan-t5-base
	model-index:
	- name: passage-difficulty-simplifier-chirho
	results:
	- task:
	type: text2text-generation
	name: Text Generation
	metrics:
	- name: Eval Loss
	type: eval_loss
	value: 2.228
	- name: Difficulty Accuracy
	type: accuracy
	value: 0.9377
	- name: Combined Score
	type: combined_score
	value: 0.3781
	---

	<!-- For God so loved the world that he gave his only begotten Son,
	that whoever believes in him should not perish but have eternal life. - John 3:16 -->

	# Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)

	A fine-tuned google/flan-t5-base (248M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model. Upgraded from flan-t5-small (80M) for improved accuracy.

	## Model Description

	This model takes Bible passages as input and performs one of two tasks, selected by a natural language prefix:

	### Task 1: Difficulty Scoring

	Analyzes a Bible passage and produces a structured difficulty assessment.

	- Prefix: `rate difficulty:`
	- Output format: `reading_level: [1-12] \| vocab_complexity: [low/medium/high] \| archaic_forms: [count] \| difficulty: [easy/medium/hard]`

	### Task 2: Simplification

	Converts archaic or complex Bible passages into plain modern English.

	- Prefix: `simplify:`
	- Output: Plain-language paraphrase of the input verse

	## Training Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| `google/flan-t5-base` (248M params) \|
	\| Architecture \| Encoder-Decoder (T5) \|
	\| Training approach \| Full fine-tuning, multi-task \|
	\| Trainer \| `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` \|
	\| Epochs \| 5 \|
	\| Batch size \| 32 (H200 GPU) \|
	\| Effective batch size \| 32 (gradient accumulation = 1 on H200) \|
	\| Learning rate \| 2e-4 \|
	\| LR scheduler \| Cosine with 10% warmup \|
	\| Weight decay \| 0.01 \|
	\| Label smoothing \| 0.1 \|
	\| Mixed precision \| bf16 (H200) \|
	\| Max input length \| 256 tokens \|
	\| Max target length \| 256 tokens \|
	\| Early stopping \| Patience = 2, monitoring `eval_loss` \|
	\| Best model selection \| Lowest `eval_loss` \|
	\| Generation (eval) \| `predict_with_generate=True`, beam search \|

	### Dataset

	Trained on approximately 120K+ examples combining both tasks, split by Bible book to prevent verse-level leakage (80/10/10 by book):

	\| Task \| Target Count \| Description \|
	\|---\|---\|---\|
	\| Difficulty scoring \| ~64K \| Verses from 6 translations with algorithmically computed labels \|
	\| Simplification \| ~96K \| Cross-translation pairs mapping complex to simple English \|

	#### Translations Used

	\| Translation \| Style \| Role \|
	\|---\|---\|---\|
	\| KJV (King James Version) \| Formal, archaic \| Complex source \|
	\| ASV (American Standard Version) \| Formal, dated \| Complex source \|
	\| YLT (Young's Literal Translation) \| Ultra-literal \| Complex source \|
	\| Darby Bible \| Literal, dated \| Complex source / Difficulty scoring \|
	\| BBE (Bible in Basic English) \| 850-word vocabulary, ~Grade 4 \| Simple target \|
	\| OEB (Open English Bible) \| Modern, public domain \| Simple target \|

	#### Simplification Pairs

	\| Complex Source \| Simple Target \|
	\|---\|---\|
	\| KJV \| BBE \|
	\| KJV \| OEB \|
	\| ASV \| BBE \|
	\| YLT \| OEB \|

	#### Data Source

	Bible text sourced from ScrollMapper Bible Databases (public domain translations on GitHub).

	#### Difficulty Scoring Labels

	Labels are computed algorithmically from textual features:

	- Reading level (1-12): Approximate Flesch-Kincaid grade level analog, adjusted for archaic vocabulary and uncommon word ratio
	- Vocabulary complexity (low/medium/high): Ratio of words outside a ~3,000-word common English vocabulary
	- Archaic forms (count): Number of archaic English words detected (thee, thou, hath, doth, -eth/-est verb endings, etc.)
	- Difficulty (easy/medium/hard): Composite score from reading level, vocabulary complexity, and archaic form count

	## Usage

	### Quick Start: Simplification

	```python
	# For God so loved the world that he gave his only begotten Son,
	# that whoever believes in him should not perish but have eternal life. - John 3:16

	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
	model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")

	input_text_chirho = "simplify: And the LORD God formed man of the dust of the ground, and breathed into his nostrils the breath of life; and man became a living soul."

	inputs_chirho = tokenizer_chirho(input_text_chirho, return_tensors="pt", max_length=256, truncation=True)
	outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)
	result_chirho = tokenizer_chirho.decode(outputs_chirho[0], skip_special_tokens=True)

	print(result_chirho)
	# Expected: A simplified, modern English version of the verse
	```

	### Quick Start: Difficulty Scoring

	```python
	# For God so loved the world that he gave his only begotten Son,
	# that whoever believes in him should not perish but have eternal life. - John 3:16

	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	import re

	tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
	model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")

	input_text_chirho = "rate difficulty: For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life."

	inputs_chirho = tokenizer_chirho(input_text_chirho, return_tensors="pt", max_length=256, truncation=True)
	outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)
	raw_output_chirho = tokenizer_chirho.decode(outputs_chirho[0], skip_special_tokens=True)

	print(raw_output_chirho)
	# Expected: "reading_level: X \| vocab_complexity: Y \| archaic_forms: Z \| difficulty: W"

	# Parse structured output
	reading_level_chirho = re.search(r"reading_level:\s*(\d+)", raw_output_chirho)
	difficulty_chirho = re.search(r"difficulty:\s*(\w+)", raw_output_chirho)
	vocab_chirho = re.search(r"vocab_complexity:\s*(\w+)", raw_output_chirho)
	archaic_chirho = re.search(r"archaic_forms:\s*(\d+)", raw_output_chirho)

	if reading_level_chirho:
	print(f"Reading Level: Grade {reading_level_chirho.group(1)}")
	if difficulty_chirho:
	print(f"Difficulty: {difficulty_chirho.group(1)}")
	if vocab_chirho:
	print(f"Vocabulary Complexity: {vocab_chirho.group(1)}")
	if archaic_chirho:
	print(f"Archaic Forms: {archaic_chirho.group(1)}")
	```

	### Batch Inference

	```python
	# For God so loved the world that he gave his only begotten Son,
	# that whoever believes in him should not perish but have eternal life. - John 3:16

	import torch
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
	model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
	model_chirho.eval()

	verses_chirho = [
	"simplify: Verily, verily, I say unto thee, Except a man be born again, he cannot see the kingdom of God.",
	"simplify: Wherefore, as by one man sin entered into the world, and death by sin; and so death passed upon all men, for that all have sinned:",
	"rate difficulty: In the beginning God created the heaven and the earth.",
	"rate difficulty: Jesus wept.",
	]

	inputs_chirho = tokenizer_chirho(verses_chirho, return_tensors="pt", max_length=256, truncation=True, padding=True)

	with torch.no_grad():
	outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)

	results_chirho = tokenizer_chirho.batch_decode(outputs_chirho, skip_special_tokens=True)

	for verse_chirho, result_chirho in zip(verses_chirho, results_chirho):
	print(f"Input: {verse_chirho}")
	print(f"Output: {result_chirho}\n")
	```

	## Evaluation

	### Metrics

	\| Task \| Metric \| Description \|
	\|---\|---\|---\|
	\| Difficulty Scoring \| `difficulty_accuracy_chirho` \| Exact match on easy/medium/hard label \|
	\| Difficulty Scoring \| Reading level MAE \| Mean absolute error on grade level (1-12) \|
	\| Difficulty Scoring \| Vocab complexity accuracy \| Exact match on low/medium/high \|
	\| Simplification \| BLEU \| Corpus-level BLEU score (sacrebleu) \|
	\| Simplification \| BERTScore F1 \| Semantic similarity to reference simplifications \|
	\| Simplification \| Exact match \| Proportion of predictions matching reference exactly \|
	\| Combined \| `combined_score_chirho` \| 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match \|

	### Results (v2 - flan-t5-base upgrade)

	\| Metric \| Score \|
	\|---\|---\|
	\| Eval loss \| 2.228 (best at epoch 3) \|
	\| Difficulty accuracy \| 93.8% \|
	\| Simplification exact match \| 0.50% \|
	\| Combined score \| 0.378 \|
	\| Train loss \| 1.964 \|
	\| Hardware \| NVIDIA H200 (143GB), ~64 min \|

	### Training Trajectory

	\| Epoch \| Eval Loss \| Difficulty Acc \| Combined Score \|
	\|-------\|-----------\|----------------\|----------------\|
	\| 1 \| 2.282 \| 87.1% \| 0.351 \|
	\| 2 \| 2.244 \| 91.9% \| 0.370 \|
	\| 3 \| 2.228 \| 93.8% \| 0.378 \|
	\| 4 \| 2.236 \| 94.7% \| 0.382 \|
	\| 5 \| 2.241 \| 94.8% \| 0.382 \|

	Best model selected by lowest eval_loss (epoch 3). Difficulty accuracy continued improving through epoch 5 but loss began increasing at epoch 4, indicating mild overfitting on the simplification task.

	## Try It Live

	[Interactive Demo on HuggingFace Spaces](https://huggingface.co/spaces/LoveJesus/passage-difficulty-simplifier-chirho)

	The Gradio-powered demo provides two tabs:
	- Simplify: Enter any Bible verse and receive a plain-language version
	- Difficulty: Enter a verse and get reading level, vocabulary complexity, archaic form count, and overall difficulty

	## Limitations

	- Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
	- Simplification quality varies by verse length and complexity; very long passages may be truncated
	- Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
	- Base model (248M params) balances accuracy with accessibility
	- Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
	- Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
	- The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible

	## Intended Use

	- Bible study tools that need plain-language paraphrasing of archaic translations
	- Reading level assessment for curriculum planning or children's ministry materials
	- Accessibility applications that present Bible text at appropriate reading levels
	- Research into text simplification for historical English

	## Out-of-Scope Use

	- Replacing authoritative Bible translations for doctrinal study
	- General-purpose text simplification outside of biblical literature
	- Machine translation between languages (this model operates only in English)

	## Model Architecture

	```
	google/flan-t5-base (Encoder-Decoder)
	Encoder: 12 layers, 12 heads, d_model=768
	Decoder: 12 layers, 12 heads, d_model=768
	Total parameters: ~248M (all trainable, full fine-tuning)
	Vocabulary: SentencePiece, 32,128 tokens
	```

	## Repository Structure

	```
	passage-difficulty-simplifier-chirho/
	src-chirho/
	train-chirho/train-simplifier-chirho.py # Training script
	eval-chirho/evaluate-chirho.py # Evaluation script
	data-chirho/build-simplifier-dataset-chirho.ts # Dataset builder (Bun/TS)
	data-chirho/download-translations-chirho.ts # Translation downloader
	upload-hf-chirho.py # HuggingFace upload script
	space-chirho/
	app.py # Gradio demo application
	data-chirho/
	raw-chirho/ # Raw Bible CSVs
	processed-chirho/ # JSONL train/val/test splits
	models-chirho/
	simplifier-chirho/best-chirho/ # Best checkpoint
	cards-chirho/
	simplifier-card-chirho.md # This model card
	config-chirho.yaml # Training configuration
	spec-chirho/
	progress-chirho.sqlite # Agent progress log
	```

	## Training Reproducibility

	```bash
	# 1. Download Bible translations
	cd passage-difficulty-simplifier-chirho
	bun run src-chirho/data-chirho/download-translations-chirho.ts

	# 2. Build dual-task dataset
	bun run src-chirho/data-chirho/build-simplifier-dataset-chirho.ts

	# 3. Train model
	python src-chirho/train-chirho/train-simplifier-chirho.py

	# 4. Evaluate
	python src-chirho/eval-chirho/evaluate-chirho.py

	# 5. Upload to HuggingFace
	python src-chirho/upload-hf-chirho.py
	```

	## License

	MIT

	## Citation

	```bibtex
	@misc{lovejesus2026passagedifficultysimplifier,
	title={Passage Difficulty Scorer & Plain-Language Simplifier: Multi-Task Flan-T5 for Bible Readability},
	author={loveJesus},
	year={2026},
	publisher={HuggingFace},
	url={https://huggingface.co/LoveJesus/passage-difficulty-simplifier-chirho}
	}
	```

	---

	Built with love for Jesus. Published by [loveJesus](https://huggingface.co/LoveJesus).