Instructions to use timtkddn/ko-ocr-qwen2-vl-awq with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Local Apps

How to use timtkddn/ko-ocr-qwen2-vl-awq with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "timtkddn/ko-ocr-qwen2-vl-awq"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "timtkddn/ko-ocr-qwen2-vl-awq",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/timtkddn/ko-ocr-qwen2-vl-awq

SGLang

How to use timtkddn/ko-ocr-qwen2-vl-awq with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "timtkddn/ko-ocr-qwen2-vl-awq" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "timtkddn/ko-ocr-qwen2-vl-awq",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "timtkddn/ko-ocr-qwen2-vl-awq" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "timtkddn/ko-ocr-qwen2-vl-awq",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use timtkddn/ko-ocr-qwen2-vl-awq with Docker Model Runner:
```
docker model run hf.co/timtkddn/ko-ocr-qwen2-vl-awq
```

ko-ocr-qwen2-vl-awq

Model Summary

ko-ocr-qwen2-vl-awq is a fine-tuned and quantized version of Qwen/Qwen2-VL-72B-Instruct, optimized for Korean OCR tasks. The model was trained with supervised fine-tuning (SFT) and further compressed using AWQ (Activation-aware Weight Quantization) for efficient inference with minimal performance loss.

Intended Use

This model is designed for OCR tasks on Korean images, capable of recognizing text in natural scenes, scanned documents, and mixed-language content. It also supports general visual-language understanding, such as image captioning and question answering.

Requirements

The code of Qwen2-VL has been in the latest Hugging face transformers and we advise you to build from source with command pip install git+https://github.com/huggingface/transformers, or you might encounter the following error:

KeyError: 'qwen2_vl'

Quickstart

We offer a toolkit to help you handle various types of visual input more conveniently. This includes base64, URLs, and interleaved images and videos. You can install it using the following command:

pip install qwen-vl-utils

Image Resolution for performance boost

The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage.

min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28
processor = AutoProcessor.from_pretrained(
    "timtkddn/ko-ocr-qwen2-vl-awq", min_pixels=min_pixels, max_pixels=max_pixels
)

Besides, We provide two methods for fine-grained control over the image size input to the model:

Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels.
Specify exact dimensions: Directly set resized_height and resized_width. These values will be rounded to the nearest multiple of 28.

# min_pixels and max_pixels
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "file:///path/to/your/image.jpg",
                "resized_height": 280,
                "resized_width": 420,
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]
# resized_height and resized_width
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "file:///path/to/your/image.jpg",
                "min_pixels": 50176,
                "max_pixels": 50176,
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

Downloads last month: -

Safetensors

Model size

73B params

Tensor type

I32

F16

Model tree for timtkddn/ko-ocr-qwen2-vl-awq

Base model

Qwen/Qwen2-VL-72B

Finetuned

Qwen/Qwen2-VL-72B-Instruct

Quantized

(21)

this model

Paper for timtkddn/ko-ocr-qwen2-vl-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 13