Instructions to use google/gemma-4-31B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-31B-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-4-31B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-31B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- AMD Developer Cloud
- Local Apps
- vLLM
How to use google/gemma-4-31B-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-4-31B-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-4-31B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/gemma-4-31B-it
- SGLang
How to use google/gemma-4-31B-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-4-31B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-4-31B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-4-31B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-4-31B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/gemma-4-31B-it with Docker Model Runner:
docker model run hf.co/google/gemma-4-31B-it
Sentencepiece model file
Thank you for releasing the gemma4 model.
It appears that gemma4 uses a Sentencepiece tokenizer very similar to the one in gemma3. Would you be able to upload the Sentencepiece model file you used?
Hi @Bingsu ,
Thank you for reaching out and notifying us about this. We have escalated this issue to our Engineering team for further investigation. We are tracking this internally and will provide an update on this thread as soon as we have a resolution or next steps.
That file is 1 bit off at address 0x12. Byte 0x38 declares field 7 of enclosing NormalizerSpec, wire type 0 (varint).
SentencePiece ProtoBuf schema does not support such a field. NormalizerSpec in the schema defines fields 1..6 and allows extensions 200..max. The fields 7..199 are illegal per official schema.
The most likely intended field is 5 (escape_whitespaces), encoded with byte 0x28, wire type unchanged, following 1-byte varint value being 0 for false.
"1 bit off at address 0x12. Byte 0x38 declares field 7 of enclosing NormalizerSpec, wire type 0 (varint)"
What is the exact error when you used SentencePiece to load that file?
I linked the official schema it violates. The "error" is an illegal field identifier.
I'm not "loading" it into anything — just reading the file per the official spec.
There is no field 7 of NormalizerSpec defined and the schema does not permit extensions with this field identifier.
Ignore the further misread fields in the left pane — that's expected of parsing a streaming serialization format like ProtoBuf. Look up its specification for wire transport encoding.
Put simply: in whatever software you're loading it in — see if NormalizerSpec.escape_whitespaces is set to be enabled.
Google set it to be disabled in the file, but since they misidentified the field — expect the value to be silently ignored and defaulted to enabled.