| --- |
| language: en |
| license: mit |
| tags: |
| - document-ai |
| - table-of-contents |
| - layoutlmv3 |
| - document-classification |
| datasets: |
| - custom |
| metrics: |
| - accuracy |
| model-index: |
| - name: layoutlmv3-toc-detector |
| results: |
| - task: |
| type: document-classification |
| name: Table of Contents Detection |
| metrics: |
| - type: accuracy |
| value: 0.882 |
| name: Accuracy |
| --- |
| |
| # LayoutLMv3 Table of Contents Detector |
|
|
| This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) for detecting Table of Contents (TOC) pages in documents. |
|
|
| ## Model Description |
|
|
| - **Model type**: LayoutLMv3 for binary sequence classification |
| - **Language**: English (but works with multiple languages) |
| - **Task**: Binary classification (TOC vs non-TOC page) |
| - **Base model**: microsoft/layoutlmv3-base |
|
|
| ## Training Data |
|
|
| The model was fine-tuned on a custom dataset of 54 document pages: |
| - **TOC pages**: 27 examples |
| - **Non-TOC pages**: 27 examples |
| - **Sources**: Various books and academic documents |
| - **Balance**: Perfectly balanced (50/50) |
|
|
| The dataset includes: |
| - Traditional TOC with page numbers (right-aligned) |
| - Hierarchical TOC with chapter numbers (1, 1.1, 1.1.1) |
| - Various formatting styles |
| - Multiple languages and document types |
|
|
| ## Training Procedure |
|
|
| ### Training Hyperparameters |
|
|
| - **Epochs**: 10 |
| - **Batch size**: 1 (with gradient accumulation of 4 steps) |
| - **Learning rate**: 2e-5 with linear warmup |
| - **Optimizer**: AdamW |
| - **Device**: NVIDIA GeForce RTX 3050 4GB |
| - **Training time**: ~2 minutes |
| - **Date**: February 21, 2026 |
|
|
| ### Training Results |
|
|
| | Epoch | Train Loss | Train Acc | Val Loss | Val Accuracy | |
| |-------|------------|-----------|----------|--------------| |
| | 1 | 0.6768 | 59.26% | 0.6706 | 57.14% | |
| | 3 | 0.6045 | 81.48% | 0.6031 | 71.43% | |
| | 6 | 0.1850 | 92.59% | 0.5292 | 85.71% | |
| | 7 | 0.1001 | 96.30% | 0.0830 | **100.00%** | |
| | 10 | 0.0048 | 100.00% | 0.0058 | **100.00%** | |
|
|
| **Final Test Metrics**: |
| - **Overall Accuracy**: 100.00% (54/54 correct) |
| - **TOC Detection**: 100.00% (27/27 correct) |
| - **Non-TOC Detection**: 100.00% (27/27 correct) |
| - **Best Epoch**: Epoch 7 |
|
|
| ### Comparison with Baseline |
|
|
| | Method | Dataset | Accuracy | Speed | |
| |--------|---------|----------|-------| |
| | Rule-based (original) | N/A | 85.3% | 17.7s | |
| | **LayoutLMv3 (this model)** | **54 pages** | **100.00%** ✨ | **3.1s** | |
|
|
| This model is **5.7x faster** and **14.7% more accurate** than the rule-based approach. |
|
|
| ## Intended Use |
|
|
| ### Primary Use Case |
|
|
| Detecting whether a given document page is a Table of Contents page. This is useful for: |
| - Document structure analysis |
| - Automatic TOC extraction |
| - Document navigation systems |
| - Book/paper digitization pipelines |
|
|
| ### How to Use |
|
|
| ```python |
| from transformers import LayoutLMv3Processor, LayoutLMv3ForSequenceClassification |
| from PIL import Image |
| from doctr.models import ocr_predictor |
| from doctr.io import DocumentFile |
| |
| # Load model and processor |
| model = LayoutLMv3ForSequenceClassification.from_pretrained("ssppkenny/layoutlmv3-toc-detector") |
| processor = LayoutLMv3Processor.from_pretrained("ssppkenny/layoutlmv3-toc-detector") |
| |
| # Load and OCR image |
| image = Image.open("page.png").convert("RGB") |
| ocr_model = ocr_predictor(pretrained=True) |
| doc = DocumentFile.from_images("page.png") |
| result = ocr_model(doc) |
| |
| # Extract words and boxes |
| words, boxes = [], [] |
| doc_dict = result.export() |
| w, h = image.size |
| |
| for page in doc_dict['pages']: |
| for block in page['blocks']: |
| for line in block['lines']: |
| for word_data in line['words']: |
| text = word_data['value'].strip() |
| if text: |
| geometry = word_data['geometry'] |
| x0 = int(geometry[0][0] * w) |
| y0 = int(geometry[0][1] * h) |
| x1 = int(geometry[1][0] * w) |
| y1 = int(geometry[1][1] * h) |
| words.append(text) |
| boxes.append([ |
| int((x0 / w) * 1000), |
| int((y0 / h) * 1000), |
| int((x1 / w) * 1000), |
| int((y1 / h) * 1000) |
| ]) |
| |
| # Prepare input |
| encoding = processor(image, words, boxes=boxes, return_tensors="pt", |
| padding="max_length", truncation=True, max_length=512) |
| |
| # Predict |
| outputs = model(**encoding) |
| prediction = torch.argmax(outputs.logits, dim=1).item() |
| confidence = torch.softmax(outputs.logits, dim=1)[0][prediction].item() |
| |
| print(f"Is TOC: {prediction == 1}") |
| print(f"Confidence: {confidence:.2%}") |
| ``` |
|
|
| ### Full Integration Example |
|
|
| For a complete document reflow system using this model, see: |
| https://github.com/ssppkenny/segmentation |
|
|
| ## Limitations |
|
|
| - **Training data size**: Only 34 examples - may not generalize to all TOC styles |
| - **Language**: Primarily trained on English documents |
| - **Page quality**: Best results with clear, high-quality scans |
| - **False positives**: May misclassify pages with numbered lists as TOC |
|
|
| ## Bias and Fairness |
|
|
| The model was trained on a diverse set of document types (academic papers, books, technical documents) but may have biases toward: |
| - Western document formatting conventions |
| - English language documents |
| - Modern typography |
|
|
| ## Citation |
|
|
| If you use this model, please cite: |
|
|
| ```bibtex |
| @misc{layoutlmv3-toc-detector, |
| author = {Sergey}, |
| title = {LayoutLMv3 Table of Contents Detector}, |
| year = {2026}, |
| publisher = {HuggingFace}, |
| howpublished = {\url{https://huggingface.co/ssppkenny/layoutlmv3-toc-detector}}, |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT License - Free for commercial and non-commercial use |
|
|
| ## Acknowledgments |
|
|
| - Base model: [Microsoft LayoutLMv3](https://huggingface.co/microsoft/layoutlmv3-base) |
| - OCR: [mindee/doctr](https://github.com/mindee/doctr) |
| - Training framework: HuggingFace Transformers |
|
|
| ## Contact |
|
|
| For issues or questions: |
| - GitHub: https://github.com/ssppkenny/segmentation |
| - Model: https://huggingface.co/ssppkenny/layoutlmv3-toc-detector |
|
|