Instructions to use timtkddn/ko-ocr-qwen2-vl-awq with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- vLLM
How to use timtkddn/ko-ocr-qwen2-vl-awq with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "timtkddn/ko-ocr-qwen2-vl-awq" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "timtkddn/ko-ocr-qwen2-vl-awq", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/timtkddn/ko-ocr-qwen2-vl-awq
- SGLang
How to use timtkddn/ko-ocr-qwen2-vl-awq with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "timtkddn/ko-ocr-qwen2-vl-awq" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "timtkddn/ko-ocr-qwen2-vl-awq", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "timtkddn/ko-ocr-qwen2-vl-awq" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "timtkddn/ko-ocr-qwen2-vl-awq", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use timtkddn/ko-ocr-qwen2-vl-awq with Docker Model Runner:
docker model run hf.co/timtkddn/ko-ocr-qwen2-vl-awq
ko-ocr-qwen2-vl-awq
Model Summary
ko-ocr-qwen2-vl-awq is a fine-tuned and quantized version of Qwen/Qwen2-VL-72B-Instruct, optimized for Korean OCR tasks. The model was trained with supervised fine-tuning (SFT) and further compressed using AWQ (Activation-aware Weight Quantization) for efficient inference with minimal performance loss.
Intended Use
This model is designed for OCR tasks on Korean images, capable of recognizing text in natural scenes, scanned documents, and mixed-language content. It also supports general visual-language understanding, such as image captioning and question answering.
Requirements
The code of Qwen2-VL has been in the latest Hugging face transformers and we advise you to build from source with command pip install git+https://github.com/huggingface/transformers, or you might encounter the following error:
KeyError: 'qwen2_vl'
Quickstart
We offer a toolkit to help you handle various types of visual input more conveniently. This includes base64, URLs, and interleaved images and videos. You can install it using the following command:
pip install qwen-vl-utils
Image Resolution for performance boost
The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage.
min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28
processor = AutoProcessor.from_pretrained(
"timtkddn/ko-ocr-qwen2-vl-awq", min_pixels=min_pixels, max_pixels=max_pixels
)
Besides, We provide two methods for fine-grained control over the image size input to the model:
Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels.
Specify exact dimensions: Directly set
resized_heightandresized_width. These values will be rounded to the nearest multiple of 28.
# min_pixels and max_pixels
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "file:///path/to/your/image.jpg",
"resized_height": 280,
"resized_width": 420,
},
{"type": "text", "text": "Describe this image."},
],
}
]
# resized_height and resized_width
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "file:///path/to/your/image.jpg",
"min_pixels": 50176,
"max_pixels": 50176,
},
{"type": "text", "text": "Describe this image."},
],
}
]
- Downloads last month
- -