| # granite-docling ONNX Conversion Guide |
|
|
| ## Technical Reproduction Instructions |
|
|
| This document provides complete instructions for reproducing the granite-docling ONNX conversion. |
|
|
| ### Prerequisites |
|
|
| - Python 3.10+ |
| - ~4GB available RAM |
| - ~2GB disk space for conversion environment |
|
|
| ### Step 1: Environment Setup |
|
|
| ```bash |
| # Create isolated environment |
| python3 -m venv onnx_converter |
| source onnx_converter/bin/activate # Linux/Mac |
| # or onnx_converter\Scripts\activate # Windows |
| |
| # Install dependencies |
| pip install torch torchvision transformers optimum[onnxruntime] safetensors |
| ``` |
|
|
| ### Step 2: Download Original Model |
|
|
| ```bash |
| # Download granite-docling SafeTensors model |
| mkdir granite-docling-258m |
| cd granite-docling-258m |
| |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/model.safetensors" -o model.safetensors |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/config.json" -o config.json |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/tokenizer.json" -o tokenizer.json |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/preprocessor_config.json" -o preprocessor_config.json |
| ``` |
|
|
| ### Step 3: Install IBM Experimental Fork |
|
|
| ```bash |
| # Clone IBM experimental optimum-onnx fork |
| git clone https://github.com/gabe-l-hart/optimum-onnx.git |
| cd optimum-onnx |
| git checkout Idefics3Support |
| |
| # Install experimental fork |
| pip install -e . --force-reinstall |
| ``` |
|
|
| ### Step 4: Convert to ONNX |
|
|
| ```python |
| import os |
| import torch |
| os.environ['CUDA_VISIBLE_DEVICES'] = '' # Force CPU |
| |
| from pathlib import Path |
| from transformers import Idefics3ForConditionalGeneration |
| from optimum.exporters.onnx import export |
| from optimum.exporters.onnx.model_configs import Idefics3OnnxConfig |
| |
| # Load model |
| model = Idefics3ForConditionalGeneration.from_pretrained( |
| './granite-docling-258m', |
| trust_remote_code=True, |
| torch_dtype=torch.float32 |
| ).to('cpu') |
| |
| # Create ONNX config |
| onnx_config = Idefics3OnnxConfig(model.config, task='image-to-text') |
| |
| # Export to ONNX |
| output_path = Path('./granite_docling.onnx') |
| export(model, onnx_config, output_path, 17) |
| |
| print(f"ONNX conversion complete: {output_path}") |
| ``` |
|
|
| ### Expected Output |
|
|
| ``` |
| Initializing Idefics3ModelPatcher |
| Entering Idefics3ModelPatcher context |
| Patching Idefics3 model |
| Using patched position embedding forward |
| Exiting Idefics3ModelPatcher context |
| ONNX conversion complete: granite_docling.onnx (1.2GB) |
| ``` |
|
|
| ### Validation |
|
|
| ```python |
| import onnxruntime as ort |
| |
| # Test ONNX model loading |
| session = ort.InferenceSession('granite_docling.onnx') |
| print("✅ ONNX model loads successfully") |
| |
| # Check input/output specifications |
| for inp in session.get_inputs(): |
| print(f"Input: {inp.name} - {inp.shape}") |
| for out in session.get_outputs(): |
| print(f"Output: {out.name} - {out.shape}") |
| ``` |
|
|
| ## Troubleshooting |
|
|
| ### Common Issues |
|
|
| 1. **"Custom architecture" error**: Ensure using IBM experimental fork |
| 2. **Memory errors**: Use CPU-only conversion (`CUDA_VISIBLE_DEVICES=''`) |
| 3. **Import errors**: Verify experimental fork installed with `-e .` |
|
|
| ### Technical Notes |
|
|
| - **Conversion time**: 5-10 minutes on typical CPU |
| - **Memory usage**: ~4GB RAM during conversion |
| - **Warnings**: TracerWarnings are expected for complex VLM |
| - **File size**: ONNX (~1.2GB) vs SafeTensors (~492MB) due to graph inclusion |
|
|
| ## Attribution |
|
|
| Original model: IBM Research granite-docling-258M |
| Conversion method: IBM experimental Idefics3Support optimum-onnx fork |
| Documentation: lamco-development |