| --- |
| license: apple-amlr |
| license_name: apple-ascl |
| license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data |
| library_name: mobileclip |
| --- |
| |
| # 📸 MobileCLIP-B Zero-Shot Image Classifier |
| ### Hugging Face Inference Endpoint |
|
|
| > **Production-ready wrapper** around Apple’s MobileCLIP-B checkpoint. |
| > Handles image → text similarity in a single fast call. |
|
|
| --- |
|
|
| ## 📑 Sidebar |
|
|
| - [Features](#-features) |
| - [Repository layout](#-repository-layout) |
| - [Quick start (local smoke-test)](#-quick-start-local-smoke-test) |
| - [Calling the deployed endpoint](#-calling-the-deployed-endpoint) |
| - [How it works](#-how-it-works) |
| - [Updating the label set](#-updating-the-label-set) |
| - [License](#-license) |
|
|
| --- |
|
|
| ## ✨ Features |
| | | This repo | |
| |------------------------------|-----------| |
| | **Model** | MobileCLIP-B (`datacompdr` checkpoint) | |
| | **Branch fusion** | `reparameterize_model` baked in | |
| | **Mixed-precision** | FP16 on GPU, FP32 on CPU | |
| | **Pre-computed text feats** | One-time encoding of prompts in `items.json` | |
| | **Per-request work** | _Only_ image decoding → encode_image → softmax | |
| | **Latency (A10G)** | < 30 ms once the image arrives | |
| |
| --- |
| |
| ## 📁 Repository layout |
| |
| | Path | Purpose | |
| |--------------------|------------------------------------------------------------------| |
| | `handler.py` | HF entry-point (loads model + text cache, serves requests) | |
| | `reparam.py` | 60-line stand-alone copy of Apple’s `reparameterize_model` | |
| | `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) | |
| | `items.json` | Your label set (`id`, `name`, `prompt` per line) | |
| | `README.md` | This document | |
|
|
| --- |
|
|
| ## 🚀 Quick start (local smoke-test) |
|
|
| ```bash |
| python -m venv venv && source venv/bin/activate |
| pip install -r requirements.txt |
| |
| python - <<'PY' |
| import base64, json, handler, pathlib |
| app = handler.EndpointHandler() |
| |
| img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode() |
| print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes |
| PY |
| ``` |
|
|
| --- |
|
|
| ## 🌐 Calling the deployed endpoint |
|
|
| ```bash |
| ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud" |
| TOKEN="hf_xxxxxxxxxxxxxxxxx" |
| IMG="cat.jpg" |
| |
| python - <<'PY' |
| import base64, json, os, requests, sys |
| url = os.environ["ENDPOINT"] |
| token = os.environ["TOKEN"] |
| img = sys.argv |
| |
| payload = { |
| "inputs": { |
| "image": base64.b64encode(open(img, "rb").read()).decode() |
| } |
| } |
| resp = requests.post( |
| url, |
| headers={ |
| "Authorization": f"Bearer {token}", |
| "Content-Type": "application/json", |
| "Accept": "application/json", |
| }, |
| json=payload, |
| timeout=60, |
| ) |
| print(json.dumps(resp.json()[:5], indent=2)) |
| PY |
| $IMG |
| ``` |
|
|
| *Response example* |
|
|
| ```json |
| [ |
| { "id": 23, "label": "cat", "score": 0.92 }, |
| { "id": 11, "label": "tiger cat", "score": 0.05 }, |
| { "id": 48, "label": "siamese cat", "score": 0.02 } |
| ] |
| ``` |
|
|
| --- |
|
|
| ## ⚙️ How it works |
|
|
| 1. **Startup (runs once per replica)** |
|
|
| * Downloads / loads MobileCLIP-B (`datacompdr`). |
| * Fuses MobileOne branches via `reparam.py`. |
| * Reads `items.json` and encodes every prompt → `[N,512]` tensor. |
|
|
| 2. **Per request** |
|
|
| * Decodes base-64 JPEG/PNG. |
| * Applies OpenCLIP preprocessing (224 × 224 center-crop + normalise). |
| * Encodes the image, normalises, computes cosine similarity vs. cached text matrix. |
| * Returns sorted `[{id, label, score}, …]`. |
|
|
| --- |
|
|
| ## 🔄 Updating the label set |
|
|
| Simply edit `items.json`, push, and redeploy. |
|
|
| ```json |
| [ |
| { "id": 0, "name": "cat", "prompt": "a photo of a cat" }, |
| { "id": 1, "name": "dog", "prompt": "a photo of a dog" } |
| ] |
| ``` |
|
|
| No code changes are required; the handler re-encodes prompts at start-up. |
|
|
| --- |
|
|
| ## ⚖️ License |
|
|
| * **Weights / data** — Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data)) |
| * **This wrapper code** — MIT |
|
|
| --- |
|
|
| <div align="center"><sub>Maintained with ❤️ by Your-Team — Aug 2025</sub></div> |
|
|
|
|