DeepSeek-OCR-2 Thai OCR (Trained Model)
This repository contains the final Thai OCR model weights that were trained from unsloth/DeepSeek-OCR-2 and exported as a full merged model for inference.
Model Summary
- Task: Thai OCR text extraction (especially handwriting)
- Base model:
unsloth/DeepSeek-OCR-2 - Fine-tuning method: LoRA
- Final artifact in this repo: merged full model (not adapter-only)
- Prompt format used during training/inference:
<image>\nOCR this image and output the Thai text.
Training Data
- Dataset:
iapp/thai_handwriting_dataset - Main training portions used:
train[0:10150]train[10150:13600](continuation phase)
Training Configuration
LoRA configuration:
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]r = 16lora_alpha = 16lora_dropout = 0bias = "none"
Main optimization setup:
per_device_train_batch_size = 1gradient_accumulation_steps = 4learning_rate = 2e-4optim = "adamw_8bit"lr_scheduler_type = "linear"
Vision preprocessing setup:
base_size = 1024image_size = 768crop_mode = Trueauto_resize = Truemax_dynamic_crops = 6
Final Model Weights in This Repo
The model published to Hugging Face is a merged model exported to safetensors shards.
Expected key files:
model-00001-of-00002.safetensorsmodel-00002-of-00002.safetensorsmodel.safetensors.index.jsonconfig.json- tokenizer and processor files
Quick Inference
from transformers import AutoModel
from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained(
model_name="phoritus/deepseek-ocr2-thai",
auto_model=AutoModel,
trust_remote_code=True,
load_in_4bit=False,
device_map="auto",
)
result = model.infer(
tokenizer,
prompt="<image>\nOCR this image and output the Thai text.",
image_file="sample_image.jpg",
output_path="output_results",
base_size=1024,
image_size=768,
crop_mode=True,
save_results=True,
test_compress=False,
)
print(result)
Evaluation Snapshot (Infer Path, Random 10)
JiWER OCR test summary:
samples: 10wer: 1.0cer: 0.648695652173913mer: 1.0wil: 1.0wip: 0.0hits: 0substitutions: 12deletions: 2insertions: 0
Per-sample results:
| order | source_idx | sample_wer | sample_cer |
|---|---|---|---|
| 1 | 440 | 1.0 | 0.8823529411764706 |
| 2 | 996 | 1.0 | 0.17543859649122806 |
| 3 | 687 | 1.0 | 0.7 |
| 4 | 124 | 1.0 | 0.7065217391304348 |
| 5 | 408 | 1.0 | 0.5074626865671642 |
| 6 | 46 | 1.0 | 0.36 |
| 7 | 404 | 1.0 | 0.7659574468085106 |
| 8 | 254 | 1.0 | 0.7941176470588235 |
| 9 | 666 | 1.0 | 0.8490566037735849 |
| 10 | 978 | 1.0 | 0.8823529411764706 |
Limitations
- Best quality is expected on Thai receipt/handwriting-like inputs similar to training data.
- Performance can degrade on low-quality, rotated, or heavily noisy images.
- For production use, validate with your own evaluation set and prompt constraints.
Acknowledgements
- Base model:
unsloth/DeepSeek-OCR-2 - Libraries: Unsloth, Transformers, PEFT
- Dataset:
iapp/thai_handwriting_dataset
- Downloads last month
- 57