Instructions to use datalab-to/surya-ocr-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use datalab-to/surya-ocr-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="datalab-to/surya-ocr-2") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("datalab-to/surya-ocr-2") model = AutoModelForImageTextToText.from_pretrained("datalab-to/surya-ocr-2") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use datalab-to/surya-ocr-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "datalab-to/surya-ocr-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "datalab-to/surya-ocr-2", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/datalab-to/surya-ocr-2
- SGLang
How to use datalab-to/surya-ocr-2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "datalab-to/surya-ocr-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "datalab-to/surya-ocr-2", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "datalab-to/surya-ocr-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "datalab-to/surya-ocr-2", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use datalab-to/surya-ocr-2 with Docker Model Runner:
docker model run hf.co/datalab-to/surya-ocr-2
Surya
So Surya actually become Chandra but kept the name? Same Qwen3 finetuning. Why bother anyways?
Side thought: can't wait the guys from Alibaba to come up with a QWEN*-OCR to see what will remain of all the spawns.
I'm being mean because old school surya was really good. But now, all you can see is QWEN spawns.
- Chandra OCR (Qwen-3-VL, 9B)
- Chandra OCR 2 (Qwen-3-VL, fine-tuned)
- Surya OCR 2 (Qwen-3-VL)
- olmOCR (Qwen2.5-VL, 7B)
- olmOCR-2 (Qwen2.5-VL, 8B)
- Nanonets-OCR2-3B (Qwen-based)
- DeepSeek-OCR-3B (Qwen backbone)
- PaddleOCR-VL-0.9B (Qwen backbone)
etc.
When we set out to redo surya, we were optimizing for wide compatibility, usability on low-end GPUs and CPUs, compatibility with marker, accuracy, and multilingual performance.
Surya is still widely used, and this is a meaningful upgrade for all of those people. We boosted accuracy significantly (olmocr score 75% to 83.3%), made the model smaller, collapsed secondary models (like table recognition) into one, made it CPU-compatible, and improved language compatibility.
This model makes architectural modifications to the lm head/embeddings (look at the param counts). This preserves original surya tokenizer behavior, actually. And it does it for a clear reason - to improve memory util and accuracy.
But even if it had been a straight finetune, if it achieves goals/is useful, why are you against it? I can see from your Huggingface that you've finetuned models yourself.
I'm not against it, I just loved old Surya and I was not too happy seeing it transformed into Chandra, but that's my opinion, I like your work!
Yes, I finetuned many models, Surya, Chandra and Chandra 2 also.
P.S. I just got sick seeing everywhere QWEN3 OCRs 😁
I think Qwen do a good stuff at reasoning part with small model led to many people will prefer using it as a base model and modify the architecture to make it more robust. For me, this one is still a great thing, at least the author show it using qwen (they can easily hide it lol). BTW, great work @vikp
I think Qwen do a good stuff at reasoning part with small model led to many people will prefer using it as a base model and modify the architecture to make it more robust. For me, this one is still a great thing, at least the author show it using qwen (they can easily hide it lol). BTW, great work @vikp
Yeah, really great job, no doubt about it.
they can easily hide it lol, are you sure you know what are you talking about?
I think Qwen do a good stuff at reasoning part with small model led to many people will prefer using it as a base model and modify the architecture to make it more robust. For me, this one is still a great thing, at least the author show it using qwen (they can easily hide it lol). BTW, great work @vikp
Yeah, really great job, no doubt about it.
they can easily hide it lol, are you sure you know what are you talking about?
they can easily do that right? They can modify it and update in their inference code. Why it is hard to do that? The architecture can be the same or modify but they can easily change it, in case you want to make sure, you have to dive into the model.
Ok, do that and let me know. If I can't detect your modified/hidden Qwen model I eat my words… I doubt it though. But this discussion/matter makes no sense anyway. Have a good day!
