Spaces:
Paused
Paused
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| **FDAM AI Pipeline** - Fire Damage Assessment Methodology v4.0.1 implementation. An AI-powered system that generates professional Cleaning Specifications / Scope of Work documents for fire damage restoration. | |
| - **Deployment**: HuggingFace Spaces with Nvidia L4 (22GB VRAM per GPU, single GPU used) | |
| - **Local Dev**: RTX 4090 (24GB) - can run 4B model; use mock models for faster iteration | |
| - **Spec Document**: `FDAM_AI_Pipeline_Technical_Spec.md` is the authoritative technical reference | |
| ## Critical Constraints | |
| 1. **No External API Calls** - 100% locally-owned models only (no Claude/OpenAI APIs) | |
| 2. **Memory Budget** - Single L4 (22GB): ~10GB vision (4B) + ~4GB embedding + ~4GB reranker (~18GB used, ~4GB headroom) | |
| 3. **Processing Time** - 60-90 seconds per assessment is acceptable | |
| 4. **MVP Scope** - Phase 1 (PRE) and Phase 2 (PRA) only; no lab results processing yet | |
| 5. **Static RAG** - Knowledge base is pre-indexed; no user document uploads | |
| ## Tech Stack | |
| | Component | Technology | | |
| |-----------|------------| | |
| | UI Framework | Gradio 6.x | | |
| | Vision | Qwen/Qwen3-VL-4B-Thinking (via vLLM, single GPU) | | |
| | Embeddings | Qwen/Qwen3-VL-Embedding-2B (2048-dim) | | |
| | Reranker | Qwen/Qwen3-VL-Reranker-2B | | |
| | Inference | vLLM (single GPU, no tensor parallelism) | | |
| | Vector Store | ChromaDB 0.4.x | | |
| | Validation | Pydantic 2.x | | |
| | PDF Generation | Pandoc 3.x | | |
| | Package Manager | pip + requirements.txt | | |
| ## UI Components (Gradio 6.x) | |
| **Simplified 2-Tab UI:** Input + Results/Chat. | |
| Single-room workflow with integrated chat for Q&A and document modifications. | |
| ### Tab 1: Input | |
| Uses `gr.Accordion` for collapsible sections: | |
| - **Room Details** (open by default): Name, dimensions, ceiling height, facility classification, construction era | |
| - **Images** (open by default): Multi-file upload, gallery preview, image count | |
| - **Field Observations** (collapsed by default): 15 qualitative observation fields | |
| ### Tab 2: Results + Chat | |
| - **Results Display**: Annotated gallery, assessment stats (JSON), SOW document (markdown) | |
| - **Downloads**: Markdown and PDF export | |
| - **Chat Interface**: Q&A about results, document modifications via `gr.Chatbot(type="messages")` | |
| - **Quick Actions**: Pre-defined buttons for common queries | |
| The frontend uses optimized input components: | |
| | Field | Component | Notes | | |
| |-------|-----------|-------| | |
| | Room Name | `gr.Textbox` | Required field | | |
| | Dimensions | `gr.Number` | Length, Width in feet | | |
| | Ceiling Height | `gr.Dropdown` + custom option | 8-20 ft presets | | |
| | Facility Classification | `gr.Radio` | operational, non-operational, public-childcare | | |
| | Construction Era | `gr.Radio` | pre-1980, 1980-2000, post-2000 | | |
| | Image Upload | `gr.Files(file_count="multiple")` | Batch upload, auto-assigned to room | | |
| | Chat | `gr.Chatbot(type="messages")` | Gradio 6 messages format | | |
| **Keyboard Shortcuts:** | |
| - `Ctrl+1`: Navigate to Input tab | |
| - `Ctrl+2`: Navigate to Results tab | |
| ## Development Commands | |
| ```sh | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run locally with mock models | |
| MOCK_MODELS=true python app.py | |
| # Run with real models (HuggingFace only - requires A100) | |
| python app.py | |
| # Recommended tooling (install as dev dependencies) | |
| ruff check . # Linting | |
| ruff format . # Formatting | |
| mypy . # Type checking | |
| # Note: Tests removed - testing occurs on HuggingFace due to GPU/ChromaDB requirements | |
| ``` | |
| ## Architecture | |
| ### 6-Stage Processing Pipeline | |
| 1. **Input Validation** - Pydantic schema validation (schemas/input.py) | |
| 2. **Vision Analysis** - Per-image zone/material/condition detection (pipeline/vision.py) | |
| 3. **RAG Retrieval** - Disposition lookup, thresholds, methods (rag/retriever.py) | |
| 4. **FDAM Logic** - Disposition matrix application (pipeline/main.py) | |
| 5. **Calculations** - Surface areas, ACH, labor estimates (pipeline/calculations.py) | |
| 6. **Document Generation** - SOW, sampling plan, confidence report (pipeline/generator.py) | |
| ### Target Project Structure | |
| ``` | |
| βββ app.py # Gradio entry point | |
| βββ config/ # Inference and app settings | |
| βββ models/ # Model loading (mock vs real) | |
| βββ rag/ # Chunking, vectorstore, retrieval | |
| βββ schemas/ # Pydantic input/output models | |
| βββ pipeline/ # Main processing logic + chat handler | |
| β βββ chat.py # Chat handler for Q&A and document mods | |
| βββ ui/ # Gradio UI components | |
| β βββ tabs/ # Tab modules | |
| β βββ input_tab.py # Combined input (room + images + observations) | |
| β βββ results_tab.py # Results display + chat interface | |
| βββ RAG-KB/ # Knowledge base source files | |
| βββ chroma_db/ # ChromaDB persistence (generated) | |
| βββ sample_images/ # Sample fire damage images for testing | |
| ``` | |
| ## Domain Knowledge | |
| ### Zone Classifications | |
| - **Burn Zone**: Direct fire involvement, structural char, exposed/damaged elements | |
| - **Near-Field**: Adjacent to burn zone, heavy smoke/heat exposure, visible contamination | |
| - **Far-Field**: Smoke migration only, light deposits, no structural damage | |
| ### Condition Levels | |
| - **Background**: No visible contamination | |
| - **Light**: Faint discoloration, minimal deposits | |
| - **Moderate**: Visible film/deposits, surface color altered | |
| - **Heavy**: Thick deposits, surface texture obscured | |
| - **Structural Damage**: Physical damage requiring repair before cleaning | |
| ### Dispositions (FDAM Β§4.3) | |
| - **No Action**: Document only | |
| - **Clean**: Standard cleaning protocol | |
| - **Evaluate**: Requires professional judgment | |
| - **Remove**: Material must be removed | |
| - **Remove/Repair**: Remove and repair/replace | |
| ### Facility Classifications (affects thresholds) | |
| - **Operational**: Active workplace (higher thresholds: 500 Β΅g/100cmΒ² lead) | |
| - **Non-Operational**: Unoccupied (lower thresholds: 22 Β΅g/100cmΒ² lead) | |
| - **Public/Childcare**: Most stringent (EPA/HUD Oct 2024: 0.54 Β΅g/100cmΒ² floors) | |
| ### Key Calculations | |
| - **ACH Formula**: `Units = (Volume Γ 4) / (CFM Γ 60)` per NADCA ACR 2021 | |
| - **Sample Density**: Varies by area size per FDAM Β§2.3 | |
| - **Ceiling Deck**: Enhanced sampling (1 per 2,500 SF per FDAM Β§4.5) | |
| ## RAG Knowledge Base | |
| Source documents in `/RAG-KB/`: | |
| - FDAM v4.0.1 methodology (primary reference) | |
| - BNL SOP IH75190 (metals clearance thresholds) | |
| - IICRC/RIA/CIRI Technical Guide (wildfire restoration) | |
| - Lab method guides (PLM, ICP-MS) | |
| **Chunking rules:** | |
| - Keep tables intact (never split markdown tables) | |
| - Preserve headers with content | |
| - Include metadata (source, category, section) | |
| ## Confidence Framework | |
| | Score | Level | Action | | |
| |-------|-------|--------| | |
| | β₯90% | Very High | Accept without review | | |
| | 70-89% | High | Accept, note in report | | |
| | 50-69% | Moderate | Flag for human review | | |
| | <50% | Low | Require human verification | | |
| ## Model Loading | |
| All 3 models are loaded at startup (~18GB total on single L4 GPU): | |
| ```python | |
| from vllm import LLM, SamplingParams | |
| # Vision model via vLLM (single GPU, no tensor parallelism) | |
| vision_model = LLM( | |
| model="Qwen/Qwen3-VL-4B-Thinking", | |
| tensor_parallel_size=1, # Single GPU | |
| trust_remote_code=True, | |
| gpu_memory_utilization=0.80, | |
| max_model_len=16384, | |
| ) | |
| # Embedding and Reranker use official Qwen3VL loaders | |
| from scripts.qwen3_vl import Qwen3VLEmbedder, Qwen3VLReranker | |
| embedding_model = Qwen3VLEmbedder("Qwen/Qwen3-VL-Embedding-2B", torch_dtype=torch.bfloat16) | |
| reranker_model = Qwen3VLReranker("Qwen/Qwen3-VL-Reranker-2B", torch_dtype=torch.bfloat16) | |
| ``` | |
| Expected memory usage (~18GB total on single L4): | |
| - Vision model (4B BF16): ~10GB | |
| - Embedding model (2B): ~4GB | |
| - Reranker model (2B): ~4GB | |
| - Headroom: ~4GB for KV cache and overhead | |
| ## Local Development Strategy | |
| The RTX 4090 (24GB VRAM) can run the 4B model stack (~18GB). Two options: | |
| **Option A: Real Models Locally** | |
| 1. Set `MOCK_MODELS=false` (or omit - defaults to false) | |
| 2. Models will download and load (~18GB VRAM) | |
| 3. Full inference testing locally | |
| **Option B: Mock Models (faster iteration)** | |
| 1. Set `MOCK_MODELS=true` environment variable | |
| 2. Mock responses return realistic JSON matching vision output schema (2048-dim embeddings) | |
| 3. Test pipeline logic, UI, calculations without real inference | |
| **Deployment:** | |
| 1. Deploy to HuggingFace Spaces for production testing | |
| 2. Request build logs after deployment to confirm success | |
| 3. After changing embedding dimensions, rebuild ChromaDB: `python -m rag.index_builder --rebuild` | |
| ## Code Style | |
| - Use `Literal["a", "b", "c"]` unions instead of Enum for simple string choices | |
| - Pydantic models for all input/output validation | |
| - Explicit return types on public functions | |
| - Result types or explicit error returns over thrown exceptions | |
| - Group imports: stdlib β third-party β local | |
| ## WSL Note | |
| Dev servers must be exposed for WSL access. Use `--host 0.0.0.0` with Gradio: | |
| ```python | |
| app.launch(server_name="0.0.0.0", server_port=7860) | |
| ``` | |