| --- |
| title: MobileCLIP Image Classifier |
| emoji: πΈ |
| colorFrom: blue |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 4.44.0 |
| app_file: app.py |
| pinned: false |
| license: mit |
| --- |
| |
| # πΈ MobileCLIP-B Image Classifier |
|
|
| Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence. |
|
|
| ## π― Key Features |
|
|
| ### Core Capabilities |
| - **πΌοΈ Zero-Shot Classification**: Upload any image for instant classification without model retraining |
| - **π·οΈ Dynamic Label Management**: Add, remove, and update classification labels on-the-fly |
| - **π Interactive Results**: Visual confidence scores with sortable data tables |
| - **β‘ Optimized Performance**: Sub-30ms inference on GPU with re-parameterized MobileOne blocks |
| - **π Secure Admin Panel**: Token-protected label management interface |
| - **βοΈ Hub Persistence**: Optional versioned label storage on Hugging Face Hub |
|
|
| ### API Access |
| - **REST API**: Fully accessible via Gradio's automatic API endpoints |
| - **Base64 Support**: Direct base64 image input for backend integration |
| - **Batch Processing**: Efficient handling of multiple classification requests |
|
|
| ## ποΈ Architecture |
|
|
| ### Components |
| - **`app.py`**: Main Gradio interface with public/admin tabs and API endpoints |
| - **`handler.py`**: Core model management, inference logic, and label operations |
| - **`reparam.py`**: MobileOne re-parameterization for optimized inference |
| - **`items.json`**: Default label catalog with metadata |
|
|
| ### Model Details |
| - **Architecture**: MobileCLIP-B with re-parameterized MobileOne image encoder |
| - **Text Encoder**: Optimized CLIP text transformer |
| - **Embedding Cache**: Pre-computed text embeddings for fast inference |
| - **Device Support**: Automatic GPU/CPU detection with float16 optimization |
|
|
| ## π Quick Start |
|
|
| ### Environment Variables |
|
|
| Configure in your Space Settings β Variables and secrets: |
|
|
| | Variable | Description | Required | |
| |----------|-------------|----------| |
| | `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin) | |
| | `HF_LABEL_REPO` | Hub dataset for label storage (e.g., `user/labels`) | No | |
| | `HF_WRITE_TOKEN` | Token with write permissions to dataset repo | No | |
| | `HF_READ_TOKEN` | Token with read permissions (defaults to write token) | No | |
|
|
| ### Usage Examples |
|
|
| #### Web Interface |
| 1. Navigate to the Space URL |
| 2. Upload an image in the Classification tab |
| 3. Adjust top-k results (default: 10) |
| 4. View ranked predictions with confidence scores |
|
|
| #### API Usage |
|
|
| **Standard Classification:** |
| ```python |
| import requests |
| |
| response = requests.post( |
| "YOUR_SPACE_URL/api/classify_image", |
| files={"image": open("photo.jpg", "rb")}, |
| data={"top_k": 5} |
| ) |
| results = response.json() |
| ``` |
|
|
| **Base64 Input:** |
| ```python |
| import base64 |
| import requests |
| |
| with open("photo.jpg", "rb") as f: |
| img_base64 = base64.b64encode(f.read()).decode() |
| |
| response = requests.post( |
| "YOUR_SPACE_URL/api/classify_base64", |
| json={ |
| "image": img_base64, |
| "top_k": 10 |
| } |
| ) |
| results = response.json() |
| ``` |
|
|
| ## π§ Admin Operations |
|
|
| ### Label Management |
|
|
| Authenticated admins can perform the following operations: |
|
|
| #### Add Labels |
| ```json |
| { |
| "op": "upsert_labels", |
| "token": "YOUR_ADMIN_TOKEN", |
| "items": [ |
| {"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"}, |
| {"id": 101, "name": "airplane", "prompt": "a photo of an airplane"} |
| ] |
| } |
| ``` |
|
|
| #### Reload Specific Version |
| ```json |
| { |
| "op": "reload_labels", |
| "token": "YOUR_ADMIN_TOKEN", |
| "version": 5 |
| } |
| ``` |
|
|
| #### Remove Labels |
| ```json |
| { |
| "op": "remove_labels", |
| "token": "YOUR_ADMIN_TOKEN", |
| "ids": [100, 101] |
| } |
| ``` |
|
|
| ### Label Deduplication |
| - Automatic case-insensitive name deduplication |
| - Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same) |
| - ID-based deduplication for consistent label management |
|
|
| ## π¦ Hub Integration |
|
|
| When configured with `HF_LABEL_REPO` and tokens, the system automatically: |
|
|
| 1. **Saves Snapshots**: Each label update creates versioned snapshots |
| - `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings |
| - `snapshots/v{N}/meta.json`: Label metadata and model info |
| - `snapshots/latest.json`: Points to current version |
|
|
| 2. **Loads on Startup**: Fetches latest snapshot or specified version |
| 3. **Fallback**: Uses local `items.json` if Hub unavailable |
|
|
| ## π¨ Default Label Catalog |
|
|
| The bundled `items.json` includes 50+ kid-friendly objects with: |
| - Unique IDs and display names |
| - CLIP-optimized prompts |
| - Category metadata |
| - Fun facts and rarity ratings |
|
|
| Categories include animals, toys, food, vehicles, nature, and everyday objects. |
|
|
| ## β‘ Performance Optimization |
|
|
| - **GPU Acceleration**: Automatic CUDA detection with float16 inference |
| - **CPU Fallback**: Graceful degradation with float32 precision |
| - **Embedding Cache**: Pre-computed text embeddings updated on label changes |
| - **Re-parameterization**: MobileOne blocks optimized for inference speed |
| - **Batch Processing**: Efficient matrix operations for multi-label scoring |
|
|
| ## π Security Considerations |
|
|
| - **Token Protection**: Admin operations require `ADMIN_TOKEN` |
| - **Private Datasets**: Keep label repos private for sensitive applications |
| - **Input Validation**: Automatic sanitization of uploaded images |
| - **Memory Management**: Images processed and discarded after inference |
|
|
| ## π License |
|
|
| - **Model Weights**: Apple Sample Code License (ASCL) |
| - **Interface Code**: MIT License |
|
|
| ## π€ Contributing |
|
|
| Contributions welcome! Areas for improvement: |
| - Additional label management features |
| - Performance optimizations |
| - Extended API capabilities |
| - Multi-language support |
|
|
| ## π Resources |
|
|
| - [MobileCLIP Paper](https://arxiv.org/abs/2311.17049) |
| - [OpenCLIP Library](https://github.com/mlfoundations/open_clip) |
| - [Gradio Documentation](https://gradio.app/docs) |
| - [Hugging Face Spaces](https://huggingface.co/spaces) |