DeepSeek-OCR-2 Thai OCR (Trained Model)

This repository contains the final Thai OCR model weights that were trained from unsloth/DeepSeek-OCR-2 and exported as a full merged model for inference.

Model Summary

  • Task: Thai OCR text extraction (especially handwriting)
  • Base model: unsloth/DeepSeek-OCR-2
  • Fine-tuning method: LoRA
  • Final artifact in this repo: merged full model (not adapter-only)
  • Prompt format used during training/inference:
    • <image>\nOCR this image and output the Thai text.

Training Data

  • Dataset: iapp/thai_handwriting_dataset
  • Main training portions used:
    • train[0:10150]
    • train[10150:13600] (continuation phase)

Training Configuration

LoRA configuration:

  • target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
  • r = 16
  • lora_alpha = 16
  • lora_dropout = 0
  • bias = "none"

Main optimization setup:

  • per_device_train_batch_size = 1
  • gradient_accumulation_steps = 4
  • learning_rate = 2e-4
  • optim = "adamw_8bit"
  • lr_scheduler_type = "linear"

Vision preprocessing setup:

  • base_size = 1024
  • image_size = 768
  • crop_mode = True
  • auto_resize = True
  • max_dynamic_crops = 6

Final Model Weights in This Repo

The model published to Hugging Face is a merged model exported to safetensors shards.

Expected key files:

  • model-00001-of-00002.safetensors
  • model-00002-of-00002.safetensors
  • model.safetensors.index.json
  • config.json
  • tokenizer and processor files

Quick Inference

from transformers import AutoModel
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    model_name="phoritus/deepseek-ocr2-thai",
    auto_model=AutoModel,
    trust_remote_code=True,
    load_in_4bit=False,
    device_map="auto",
)

result = model.infer(
    tokenizer,
    prompt="<image>\nOCR this image and output the Thai text.",
    image_file="sample_image.jpg",
    output_path="output_results",
    base_size=1024,
    image_size=768,
    crop_mode=True,
    save_results=True,
    test_compress=False,
)

print(result)

Evaluation Snapshot (Infer Path, Random 10)

JiWER OCR test summary:

  • samples: 10
  • wer: 1.0
  • cer: 0.648695652173913
  • mer: 1.0
  • wil: 1.0
  • wip: 0.0
  • hits: 0
  • substitutions: 12
  • deletions: 2
  • insertions: 0

Per-sample results:

order source_idx sample_wer sample_cer
1 440 1.0 0.8823529411764706
2 996 1.0 0.17543859649122806
3 687 1.0 0.7
4 124 1.0 0.7065217391304348
5 408 1.0 0.5074626865671642
6 46 1.0 0.36
7 404 1.0 0.7659574468085106
8 254 1.0 0.7941176470588235
9 666 1.0 0.8490566037735849
10 978 1.0 0.8823529411764706

Limitations

  • Best quality is expected on Thai receipt/handwriting-like inputs similar to training data.
  • Performance can degrade on low-quality, rotated, or heavily noisy images.
  • For production use, validate with your own evaluation set and prompt constraints.

Acknowledgements

  • Base model: unsloth/DeepSeek-OCR-2
  • Libraries: Unsloth, Transformers, PEFT
  • Dataset: iapp/thai_handwriting_dataset
Downloads last month
57
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phoritus/deepseek-ocr2-thai

Adapter
(5)
this model