DeepSeek-OCR-2 Thai OCR (Trained Model)

This repository contains the final Thai OCR model weights that were trained from unsloth/DeepSeek-OCR-2 and exported as a full merged model for inference.

Model Summary

Task: Thai OCR text extraction (especially handwriting)
Base model: unsloth/DeepSeek-OCR-2
Fine-tuning method: LoRA
Final artifact in this repo: merged full model (not adapter-only)
Prompt format used during training/inference:
- <image>\nOCR this image and output the Thai text.

Training Data

Dataset: iapp/thai_handwriting_dataset
Main training portions used:
- train[0:10150]
- train[10150:13600] (continuation phase)

Training Configuration

LoRA configuration:

target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
r = 16
lora_alpha = 16
lora_dropout = 0
bias = "none"

Main optimization setup:

per_device_train_batch_size = 1
gradient_accumulation_steps = 4
learning_rate = 2e-4
optim = "adamw_8bit"
lr_scheduler_type = "linear"

Vision preprocessing setup:

base_size = 1024
image_size = 768
crop_mode = True
auto_resize = True
max_dynamic_crops = 6

Final Model Weights in This Repo

The model published to Hugging Face is a merged model exported to safetensors shards.

Expected key files:

model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
model.safetensors.index.json
config.json
tokenizer and processor files

Quick Inference

from transformers import AutoModel
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    model_name="phoritus/deepseek-ocr2-thai",
    auto_model=AutoModel,
    trust_remote_code=True,
    load_in_4bit=False,
    device_map="auto",
)

result = model.infer(
    tokenizer,
    prompt="<image>\nOCR this image and output the Thai text.",
    image_file="sample_image.jpg",
    output_path="output_results",
    base_size=1024,
    image_size=768,
    crop_mode=True,
    save_results=True,
    test_compress=False,
)

print(result)

Evaluation Snapshot (Infer Path, Random 10)

JiWER OCR test summary:

samples: 10
wer: 1.0
cer: 0.648695652173913
mer: 1.0
wil: 1.0
wip: 0.0
hits: 0
substitutions: 12
deletions: 2
insertions: 0

Per-sample results:

order	source_idx	sample_wer	sample_cer
1	440	1.0	0.8823529411764706
2	996	1.0	0.17543859649122806
3	687	1.0	0.7
4	124	1.0	0.7065217391304348
5	408	1.0	0.5074626865671642
6	46	1.0	0.36
7	404	1.0	0.7659574468085106
8	254	1.0	0.7941176470588235
9	666	1.0	0.8490566037735849
10	978	1.0	0.8823529411764706

Limitations

Best quality is expected on Thai receipt/handwriting-like inputs similar to training data.
Performance can degrade on low-quality, rotated, or heavily noisy images.
For production use, validate with your own evaluation set and prompt constraints.

Acknowledgements

Base model: unsloth/DeepSeek-OCR-2
Libraries: Unsloth, Transformers, PEFT
Dataset: iapp/thai_handwriting_dataset

Downloads last month: 57

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for phoritus/deepseek-ocr2-thai

Base model

deepseek-ai/DeepSeek-OCR-2

Finetuned

unsloth/DeepSeek-OCR-2

Adapter

(5)

this model