Spaces:
Sleeping
Sleeping
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| This is a Hugging Face Space that serves as the **complete backend** for the Piclets Discovery game. It orchestrates AI services, handles Piclet generation, and manages persistent storage. | |
| **Core Concept**: Each real-world object has ONE canonical Piclet! Players scan objects with photos, and the server generates Pokemon-style creatures using AI, tracking canonical discoveries and variations (e.g., "velvet pillow" is a variation of the canonical "pillow"). | |
| **Architecture Philosophy**: The server handles ALL AI orchestration securely. The frontend is a pure UI that makes a single API call. This prevents client-side manipulation and ensures fair play. | |
| ## Architecture | |
| ### Storage System | |
| - **HuggingFace Dataset**: `Fraser/piclets` (public dataset repository) | |
| - **Structure**: | |
| ``` | |
| piclets/ | |
| {normalized_object_name}.json # e.g., pillow.json | |
| users/ | |
| {username}.json # User profiles | |
| metadata/ | |
| stats.json # Global statistics | |
| leaderboard.json # Top discoverers | |
| ``` | |
| ### Object Normalization | |
| Objects are normalized for consistent storage: | |
| - Convert to lowercase | |
| - Remove articles (the, a, an) | |
| - Handle pluralization (pillows β pillow) | |
| - Replace spaces with underscores | |
| - Remove special characters | |
| Examples: | |
| - "The Blue Pillow" β `pillow` | |
| - "wooden chairs" β `wooden_chair` | |
| - "glasses" β `glass` (special case handling) | |
| ### Piclet Data Structure | |
| ```json | |
| { | |
| "canonical": { | |
| "objectName": "pillow", | |
| "typeId": "pillow_canonical", | |
| "discoveredBy": "username", | |
| "discoveredAt": "2024-07-26T10:30:00", | |
| "scanCount": 42, | |
| "picletData": { | |
| // Full Piclet instance data | |
| } | |
| }, | |
| "variations": [ | |
| { | |
| "typeId": "pillow_001", | |
| "attributes": ["velvet", "blue"], | |
| "discoveredBy": "username2", | |
| "discoveredAt": "2024-07-26T11:00:00", | |
| "scanCount": 5, | |
| "picletData": { | |
| // Full variation data | |
| } | |
| } | |
| ] | |
| } | |
| ``` | |
| ## API Endpoints | |
| The frontend only needs these **5 public endpoints**: | |
| ### 1. **generate_piclet** (Scanner) | |
| Complete Piclet generation workflow - the main endpoint. | |
| - **Input**: | |
| - `image`: User's photo (File) | |
| - `hf_token`: User's HuggingFace OAuth token (string) | |
| - **Process**: | |
| 1. Verifies `hf_token` β gets user info | |
| 2. Uses token to connect to **JoyCaption** β generates detailed image description | |
| 3. Uses token to call **GPT-OSS-120B** β generates Pokemon concept (object, variation, stats, description) | |
| 4. Parses concept to extract structured data | |
| 5. Uses token to call **Flux-Schnell** β generates Piclet image | |
| 6. Checks dataset for canonical/variation match | |
| 7. Saves to dataset with user attribution | |
| 8. Updates user profile (discoveries, rarity score) | |
| - **Returns**: | |
| ```json | |
| { | |
| "success": true, | |
| "piclet": {/* complete Piclet data */}, | |
| "discoveryStatus": "new" | "variation" | "existing", | |
| "canonicalId": "pillow_canonical", | |
| "message": "Congratulations! You discovered the first pillow Piclet!" | |
| } | |
| ``` | |
| - **Security**: Uses user's token to call AI services, consuming THEIR GPU quota (not the server's) | |
| ### 2. **get_user_piclets** (User Collection) | |
| Get user's discovered Piclets and stats. | |
| - **Input**: `hf_token` (string) | |
| - **Returns**: | |
| ```json | |
| { | |
| "success": true, | |
| "piclets": [{/* list of discoveries */}], | |
| "stats": { | |
| "username": "...", | |
| "totalFinds": 42, | |
| "uniqueFinds": 15, | |
| "rarityScore": 1250 | |
| } | |
| } | |
| ``` | |
| ### 3. **get_object_details** (Object Data) | |
| Get complete object information (canonical + all variations). | |
| - **Input**: `object_name` (string, e.g., "pillow", "macbook") | |
| - **Returns**: | |
| ```json | |
| { | |
| "success": true, | |
| "objectName": "pillow", | |
| "canonical": {/* canonical data */}, | |
| "variations": [{/* variation 1 */}, {/* variation 2 */}], | |
| "totalScans": 157, | |
| "variationCount": 8 | |
| } | |
| ``` | |
| ### 4. **get_recent_activity** (Activity Feed) | |
| Recent discoveries across all users. | |
| - **Input**: `limit` (int, default 20) | |
| - **Returns**: List of recent discoveries with timestamps | |
| ### 5. **get_leaderboard** (Top Users) | |
| Top discoverers by rarity score. | |
| - **Input**: `limit` (int, default 10) | |
| - **Returns**: Ranked users with stats | |
| --- | |
| **Internal Functions** (not exposed to frontend): | |
| - `search_piclet()`, `create_canonical()`, `create_variation()`, `increment_scan_count()` - Used internally by `generate_piclet()` | |
| ## Rarity System | |
| Scan count determines rarity: | |
| - **Legendary**: β€ 5 scans | |
| - **Epic**: 6-20 scans | |
| - **Rare**: 21-50 scans | |
| - **Uncommon**: 51-100 scans | |
| - **Common**: > 100 scans | |
| Rarity scoring for leaderboard: | |
| - Canonical discovery: +100 points | |
| - Variation discovery: +50 points | |
| - Additional bonuses based on rarity tier | |
| ## Authentication Strategy | |
| **Web UI Authentication**: | |
| - Gradio `auth` protects web interface from casual access | |
| - Requires username="admin" and password from `ADMIN_PASSWORD` env var | |
| - Prevents random users from manually creating piclets via UI | |
| - **Does NOT affect API access** - programmatic clients bypass this | |
| **API-Level Authentication**: | |
| - OAuth token verification for user attribution | |
| - Tokens verified via `https://huggingface.co/oauth/userinfo` | |
| - User profiles keyed by stable HF `sub` (user ID) | |
| - All discovery data is public (embracing open discovery) | |
| ## Integration with Frontend | |
| The frontend (`../piclets/`) uses these **5 simple API calls**: | |
| ```javascript | |
| // Connect to server | |
| const client = await window.gradioClient.Client.connect("Fraser/piclets-server"); | |
| // 1. Scanner - Generate complete Piclet (ONE CALL - server does everything!) | |
| const scanResult = await client.predict("/generate_piclet", { | |
| image: imageFile, | |
| hf_token: userToken | |
| }); | |
| const { success, piclet, discoveryStatus, message } = scanResult.data[0]; | |
| // 2. User Collection - Get user's Piclets + stats | |
| const myPiclets = await client.predict("/get_user_piclets", { | |
| hf_token: userToken | |
| }); | |
| const { piclets, stats } = myPiclets.data[0]; | |
| // 3. Object Details - Get object info (canonical + variations) | |
| const objectInfo = await client.predict("/get_object_details", { | |
| object_name: "pillow" | |
| }); | |
| const { canonical, variations, totalScans } = objectInfo.data[0]; | |
| // 4. Activity Feed - Get recent discoveries | |
| const activity = await client.predict("/get_recent_activity", { | |
| limit: 20 | |
| }); | |
| // 5. Leaderboard - Get top users | |
| const leaders = await client.predict("/get_leaderboard", { | |
| limit: 10 | |
| }); | |
| ``` | |
| **Why This Design?** | |
| - **Clean API**: Only 5 endpoints, each with a clear purpose | |
| - **Security**: All AI orchestration happens server-side (can't be manipulated) | |
| - **Simplicity**: Frontend is pure UI, no complex orchestration logic | |
| - **Fairness**: Uses user's GPU quota, not server's | |
| - **Reliability**: Server handles retries and error recovery | |
| ## Development | |
| ### Local Testing | |
| ```bash | |
| pip install -r requirements.txt | |
| python app.py | |
| # Access at http://localhost:7860 | |
| ``` | |
| ### Deployment | |
| Push to HuggingFace Space repository: | |
| ```bash | |
| git add -A && git commit -m "Update" && git push | |
| ``` | |
| ### Environment Variables | |
| - `HF_TOKEN`: **Required** - HuggingFace write token for dataset operations (set in Space Secrets) | |
| - `ADMIN_PASSWORD`: Optional - Password for web UI access (set in Space Secrets) | |
| - `DATASET_REPO`: Target dataset (default: "Fraser/piclets") | |
| Note: Users' `hf_token` (passed in API calls) is separate from server's `HF_TOKEN` (for dataset writes). | |
| ## Key Implementation Details | |
| ### AI Service Integration | |
| The server uses `gradio_client` to call external AI services with the user's token: | |
| - **JoyCaption** (`fancyfeast/joy-caption-alpha-two`): Detailed image captioning with brand/model recognition | |
| - **GPT-OSS-120B** (`amd/gpt-oss-120b-chatbot`): Concept generation and parsing | |
| - **Flux-Schnell** (`black-forest-labs/FLUX.1-schnell`): Anime-style Piclet image generation | |
| Each service is called with the user's `hf_token`, consuming their GPU quota. | |
| ### Concept Parsing | |
| GPT-OSS generates structured markdown with sections: | |
| - Canonical Object (specific brand/model, not generic) | |
| - Variation (distinctive attribute or "canonical") | |
| - Object Rarity (determines tier) | |
| - Monster Name, Type, Stats | |
| - Physical Stats (height, weight) | |
| - Personality, Description | |
| - Monster Image Prompt | |
| The parser uses regex to extract each section and clean the data. | |
| ### Variation Matching | |
| - Uses set intersection to find attribute overlap | |
| - 50% match threshold for variations | |
| - Attributes are normalized and trimmed | |
| ### Caching Strategy | |
| - Local cache in `cache/` directory | |
| - HuggingFace hub caching for downloads | |
| - Temporary files for uploads | |
| ### Error Handling | |
| - Token verification before any operations | |
| - Graceful fallbacks for missing data | |
| - Default user profiles for new users | |
| - Try-catch blocks around all operations | |
| - Detailed logging for debugging | |
| ## Future Enhancements | |
| 1. **Background Removal**: Add server-side background removal (currently done on frontend) | |
| 2. **Activity Log**: Separate timeline file for better performance | |
| 3. **Image Storage**: Store Piclet images directly in dataset (currently stores URLs) | |
| 4. **Badges/Achievements**: Track discovery milestones | |
| 5. **Trading System**: Allow users to trade variations | |
| 6. **Seasonal Events**: Time-limited discoveries | |
| 7. **Rate Limiting**: Per-user rate limits to prevent abuse | |
| 8. **Caching**: Cache AI responses for identical images | |
| ## Security Considerations | |
| - **Token Verification**: All operations verify HF OAuth tokens via `https://huggingface.co/oauth/userinfo` | |
| - **User Attribution**: Discoveries tracked by stable HF `sub` (user ID), not username | |
| - **Fair GPU Usage**: Users consume their own GPU quota, not server's | |
| - **Public Data**: All discovery data is public by design (embracing open discovery) | |
| - **No Client Manipulation**: AI orchestration happens server-side only | |
| - **Input Validation**: File uploads and token formats validated | |
| - **No Sensitive Data**: No passwords or private info stored | |
| - **Future**: Rate limiting per user to prevent abuse |