Dolphin OCR Deployment on Hugging Face Inference Toolkit

This guide provides step-by-step instructions to deploy the Bytedance Dolphin OCR model using the Hugging Face Inference Toolkit with GPU support.


πŸ”Ή Prerequisites

  • Docker installed
  • a GPU in your local machine
  • A Hugging Face account
  • Basic familiarity with command-line tools

πŸ”’ Step 1: Duplicate the Dolphin Model Repository

  1. Visit: https://huggingface.co/spaces/huggingface-projects/repo_duplicator
  2. Enter the source repo, in this case Bytedance/Dolphin.
  3. Name your new repo: luquiT4/DolphinInference (or any name you prefer).

πŸ”’ Step 2: Add the handler to the Model Repository

to in the documentation they mention that this files helps for compatibility https://github.com/huggingface/huggingface-inference-toolkit/#custom-handler-and-dependency-support

  • handler.py (Custom inference handler)
  • requirements.txt (Dependencies)

to add them we need to...

  1. Add a new file to the new repo:

image/png

  1. And paste this:
import base64
import io
from typing import Dict, Any

import torch
from PIL import Image
from transformers import AutoProcessor, VisionEncoderDecoderModel


class EndpointHandler:
    def __init__(self, path=""):
        # Load processor and model from the provided path or model ID
        self.processor = AutoProcessor.from_pretrained(path or "bytedance/Dolphin")
        self.model = VisionEncoderDecoderModel.from_pretrained(path or "bytedance/Dolphin")

        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(self.device)
        self.model.eval()
        self.model = self.model.half()  # Half precision for speed

        self.tokenizer = self.processor.tokenizer

    def decode_base64_image(self, image_base64: str) -> Image.Image:
        image_bytes = base64.b64decode(image_base64)
        return Image.open(io.BytesIO(image_bytes)).convert("RGB")

    def __call__(self, data: Dict[str, Any]) -> Dict[str, Any]:
        # Check for image input
        if "inputs" not in data:
            return {"error": "No inputs provided"}

        image_input = data["inputs"]

        # Support both base64 image strings and raw images (Hugging Face supports both)
        if isinstance(image_input, str):
            try:
                image = self.decode_base64_image(image_input)
            except Exception as e:
                return {"error": f"Invalid base64 image: {str(e)}"}
        else:
            image = image_input  # Assume PIL-compatible image

        # Optional: Custom prompt (default: text reading)
        prompt = data.get("prompt", "Read text in the image.")
        full_prompt = f"<s>{prompt} <Answer/>"

        # Preprocess inputs
        inputs = self.processor(image, return_tensors="pt")
        pixel_values = inputs.pixel_values.half().to(self.device)

        prompt_ids = self.tokenizer(full_prompt, add_special_tokens=False, return_tensors="pt").input_ids.to(self.device)
        decoder_attention_mask = torch.ones_like(prompt_ids).to(self.device)

        # Inference
        outputs = self.model.generate(
            pixel_values=pixel_values,
            decoder_input_ids=prompt_ids,
            decoder_attention_mask=decoder_attention_mask,
            min_length=1,
            max_length=4096,
            pad_token_id=self.tokenizer.pad_token_id,
            eos_token_id=self.tokenizer.eos_token_id,
            use_cache=True,
            bad_words_ids=[[self.tokenizer.unk_token_id]],
            return_dict_in_generate=True,
            do_sample=False,
            num_beams=1,
        )

        sequence = self.tokenizer.batch_decode(outputs.sequences, skip_special_tokens=False)[0]
        # Clean up
        generated_text = sequence.replace(full_prompt, "").replace("<pad>", "").replace("</s>", "").strip()

        return {"text": generated_text}

this has been generated using ChatGPT and this sources:

in this case it works using only handler.py without requirements.txt


πŸ”’ Step 3: Build the Hugging Face Inference Toolkit Docker Image

  1. Clone the toolkit:
git clone https://github.com/huggingface/huggingface-inference-toolkit.git
cd huggingface-inference-toolkit
  1. Important: If you are on Windows, use WSL or Linux to avoid line-ending issues (^M: bad interpreter).

  2. Build the GPU Docker image:

make inference-pytorch-gpu
# on the back will run this
# docker build -t integration-test-pytorch:gpu -f docker/Dockerfile.pytorch .

πŸ”’ Step 4: Run the Inference Server with Dolphin Model

docker run -ti -p 5001:5000 --gpus all \
  -e HF_MODEL_ID=luquiT4/DolphinInference \
  -e HF_TASK=image-to-text \
  integration-test-pytorch:gpu
  • HF_MODEL_ID = your Hugging Face model name
  • HF_TASK = task type (image-to-text)

πŸ”’ Step 5: Test the Endpoint

  1. Send an inference request:
curl --request POST \
  --url http://localhost:5001/ \
  --header 'accept: application/json' \
  --header 'content-type: application/octet-stream' \
  --data 'C:\path\to\imagewithtext.png'
  1. Enjoy a successful request

πŸ”’ Step 6 (Coming Soon): Deploy to Azure Serverless Function as an API

  • Use serverless GPU (NC T4 v3) for low-cost inference.
  • Configure scale-to-zero in Azure Container Apps to avoid idle GPU charges.
  • Monitor with Azure budgets and alerts.

info:


πŸ”Ή Troubleshooting

Issue Solution
404 requirements.txt (Optionaal) Create requirements.txt on your HF model repo
Safetensor HeaderTooLarge Clone the repo on the cloud using Hugging Face Repo Duplicator
^M bad interpreter Build Docker image on WSL or Linux

πŸ‘ Useful Links


You are now ready to deploy and run Dolphin OCR as a custom Hugging Face Inference Endpoint!

Downloads last month
10
Safetensors
Model size
0.4B params
Tensor type
I64
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support