Instructions to use google/diffusiongemma-26B-A4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/diffusiongemma-26B-A4B-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/diffusiongemma-26B-A4B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/diffusiongemma-26B-A4B-it") model = AutoModelForMultimodalLM.from_pretrained("google/diffusiongemma-26B-A4B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use google/diffusiongemma-26B-A4B-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/diffusiongemma-26B-A4B-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/diffusiongemma-26B-A4B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/diffusiongemma-26B-A4B-it
- SGLang
How to use google/diffusiongemma-26B-A4B-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/diffusiongemma-26B-A4B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/diffusiongemma-26B-A4B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/diffusiongemma-26B-A4B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/diffusiongemma-26B-A4B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/diffusiongemma-26B-A4B-it with Docker Model Runner:
docker model run hf.co/google/diffusiongemma-26B-A4B-it
Runtime error and Module not found error
/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:112: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
config.json: 100%
3.47k/3.47k [00:00<00:00, 373kB/s]
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py in getattr(self, name)
2261 try:
-> 2262 module = self._get_module(self._class_to_module[name])
2263 value = getattr(module, name)
27 frames
RuntimeError: operator torchvision::nms does not exist
The above exception was the direct cause of the following exception:
ModuleNotFoundError Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py in getattr(self, name)
2348 ) from e
2349 else:
-> 2350 raise ModuleNotFoundError(
2351 f"Could not import module '{name}'. Are this object's requirements defined correctly?"
2352 ) from e
ModuleNotFoundError: Could not import module 'Gemma4VisionConfig'. Are this object's requirements defined correctly?
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
%pip uninstall -y torch torchvision torchaudio
%pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
%pip install -U transformers accelerate
dssfdsfdsfdsffdsfdsf
Anyone running VLLM, as of yesterday you can use the gemma tagged containers:
https://hub.docker.com/layers/vllm/vllm-openai/gemma-cu129/images/sha256-e24d3d8e367ebc2a6ccfc8cf3cecee858a30a6c66b5b1642d85363cd745e001e