SmokeScan / CLAUDE.md
KinetoLabs's picture
Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity
14c59e5
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
**FDAM AI Pipeline** - Fire Damage Assessment Methodology v4.0.1 implementation. An AI-powered system that generates professional Cleaning Specifications / Scope of Work documents for fire damage restoration.
- **Deployment**: HuggingFace Spaces with Nvidia L4 (22GB VRAM per GPU, single GPU used)
- **Local Dev**: RTX 4090 (24GB) - can run 4B model; use mock models for faster iteration
- **Spec Document**: `FDAM_AI_Pipeline_Technical_Spec.md` is the authoritative technical reference
## Critical Constraints
1. **No External API Calls** - 100% locally-owned models only (no Claude/OpenAI APIs)
2. **Memory Budget** - Single L4 (22GB): ~10GB vision (4B) + ~4GB embedding + ~4GB reranker (~18GB used, ~4GB headroom)
3. **Processing Time** - 60-90 seconds per assessment is acceptable
4. **MVP Scope** - Phase 1 (PRE) and Phase 2 (PRA) only; no lab results processing yet
5. **Static RAG** - Knowledge base is pre-indexed; no user document uploads
## Tech Stack
| Component | Technology |
|-----------|------------|
| UI Framework | Gradio 6.x |
| Vision | Qwen/Qwen3-VL-4B-Thinking (via vLLM, single GPU) |
| Embeddings | Qwen/Qwen3-VL-Embedding-2B (2048-dim) |
| Reranker | Qwen/Qwen3-VL-Reranker-2B |
| Inference | vLLM (single GPU, no tensor parallelism) |
| Vector Store | ChromaDB 0.4.x |
| Validation | Pydantic 2.x |
| PDF Generation | Pandoc 3.x |
| Package Manager | pip + requirements.txt |
## UI Components (Gradio 6.x)
**Simplified 2-Tab UI:** Input + Results/Chat.
Single-room workflow with integrated chat for Q&A and document modifications.
### Tab 1: Input
Uses `gr.Accordion` for collapsible sections:
- **Room Details** (open by default): Name, dimensions, ceiling height, facility classification, construction era
- **Images** (open by default): Multi-file upload, gallery preview, image count
- **Field Observations** (collapsed by default): 15 qualitative observation fields
### Tab 2: Results + Chat
- **Results Display**: Annotated gallery, assessment stats (JSON), SOW document (markdown)
- **Downloads**: Markdown and PDF export
- **Chat Interface**: Q&A about results, document modifications via `gr.Chatbot(type="messages")`
- **Quick Actions**: Pre-defined buttons for common queries
The frontend uses optimized input components:
| Field | Component | Notes |
|-------|-----------|-------|
| Room Name | `gr.Textbox` | Required field |
| Dimensions | `gr.Number` | Length, Width in feet |
| Ceiling Height | `gr.Dropdown` + custom option | 8-20 ft presets |
| Facility Classification | `gr.Radio` | operational, non-operational, public-childcare |
| Construction Era | `gr.Radio` | pre-1980, 1980-2000, post-2000 |
| Image Upload | `gr.Files(file_count="multiple")` | Batch upload, auto-assigned to room |
| Chat | `gr.Chatbot(type="messages")` | Gradio 6 messages format |
**Keyboard Shortcuts:**
- `Ctrl+1`: Navigate to Input tab
- `Ctrl+2`: Navigate to Results tab
## Development Commands
```sh
# Install dependencies
pip install -r requirements.txt
# Run locally with mock models
MOCK_MODELS=true python app.py
# Run with real models (HuggingFace only - requires A100)
python app.py
# Recommended tooling (install as dev dependencies)
ruff check . # Linting
ruff format . # Formatting
mypy . # Type checking
# Note: Tests removed - testing occurs on HuggingFace due to GPU/ChromaDB requirements
```
## Architecture
### 6-Stage Processing Pipeline
1. **Input Validation** - Pydantic schema validation (schemas/input.py)
2. **Vision Analysis** - Per-image zone/material/condition detection (pipeline/vision.py)
3. **RAG Retrieval** - Disposition lookup, thresholds, methods (rag/retriever.py)
4. **FDAM Logic** - Disposition matrix application (pipeline/main.py)
5. **Calculations** - Surface areas, ACH, labor estimates (pipeline/calculations.py)
6. **Document Generation** - SOW, sampling plan, confidence report (pipeline/generator.py)
### Target Project Structure
```
β”œβ”€β”€ app.py # Gradio entry point
β”œβ”€β”€ config/ # Inference and app settings
β”œβ”€β”€ models/ # Model loading (mock vs real)
β”œβ”€β”€ rag/ # Chunking, vectorstore, retrieval
β”œβ”€β”€ schemas/ # Pydantic input/output models
β”œβ”€β”€ pipeline/ # Main processing logic + chat handler
β”‚ └── chat.py # Chat handler for Q&A and document mods
β”œβ”€β”€ ui/ # Gradio UI components
β”‚ └── tabs/ # Tab modules
β”‚ β”œβ”€β”€ input_tab.py # Combined input (room + images + observations)
β”‚ └── results_tab.py # Results display + chat interface
β”œβ”€β”€ RAG-KB/ # Knowledge base source files
β”œβ”€β”€ chroma_db/ # ChromaDB persistence (generated)
└── sample_images/ # Sample fire damage images for testing
```
## Domain Knowledge
### Zone Classifications
- **Burn Zone**: Direct fire involvement, structural char, exposed/damaged elements
- **Near-Field**: Adjacent to burn zone, heavy smoke/heat exposure, visible contamination
- **Far-Field**: Smoke migration only, light deposits, no structural damage
### Condition Levels
- **Background**: No visible contamination
- **Light**: Faint discoloration, minimal deposits
- **Moderate**: Visible film/deposits, surface color altered
- **Heavy**: Thick deposits, surface texture obscured
- **Structural Damage**: Physical damage requiring repair before cleaning
### Dispositions (FDAM Β§4.3)
- **No Action**: Document only
- **Clean**: Standard cleaning protocol
- **Evaluate**: Requires professional judgment
- **Remove**: Material must be removed
- **Remove/Repair**: Remove and repair/replace
### Facility Classifications (affects thresholds)
- **Operational**: Active workplace (higher thresholds: 500 Β΅g/100cmΒ² lead)
- **Non-Operational**: Unoccupied (lower thresholds: 22 Β΅g/100cmΒ² lead)
- **Public/Childcare**: Most stringent (EPA/HUD Oct 2024: 0.54 Β΅g/100cmΒ² floors)
### Key Calculations
- **ACH Formula**: `Units = (Volume Γ— 4) / (CFM Γ— 60)` per NADCA ACR 2021
- **Sample Density**: Varies by area size per FDAM Β§2.3
- **Ceiling Deck**: Enhanced sampling (1 per 2,500 SF per FDAM Β§4.5)
## RAG Knowledge Base
Source documents in `/RAG-KB/`:
- FDAM v4.0.1 methodology (primary reference)
- BNL SOP IH75190 (metals clearance thresholds)
- IICRC/RIA/CIRI Technical Guide (wildfire restoration)
- Lab method guides (PLM, ICP-MS)
**Chunking rules:**
- Keep tables intact (never split markdown tables)
- Preserve headers with content
- Include metadata (source, category, section)
## Confidence Framework
| Score | Level | Action |
|-------|-------|--------|
| β‰₯90% | Very High | Accept without review |
| 70-89% | High | Accept, note in report |
| 50-69% | Moderate | Flag for human review |
| <50% | Low | Require human verification |
## Model Loading
All 3 models are loaded at startup (~18GB total on single L4 GPU):
```python
from vllm import LLM, SamplingParams
# Vision model via vLLM (single GPU, no tensor parallelism)
vision_model = LLM(
model="Qwen/Qwen3-VL-4B-Thinking",
tensor_parallel_size=1, # Single GPU
trust_remote_code=True,
gpu_memory_utilization=0.80,
max_model_len=16384,
)
# Embedding and Reranker use official Qwen3VL loaders
from scripts.qwen3_vl import Qwen3VLEmbedder, Qwen3VLReranker
embedding_model = Qwen3VLEmbedder("Qwen/Qwen3-VL-Embedding-2B", torch_dtype=torch.bfloat16)
reranker_model = Qwen3VLReranker("Qwen/Qwen3-VL-Reranker-2B", torch_dtype=torch.bfloat16)
```
Expected memory usage (~18GB total on single L4):
- Vision model (4B BF16): ~10GB
- Embedding model (2B): ~4GB
- Reranker model (2B): ~4GB
- Headroom: ~4GB for KV cache and overhead
## Local Development Strategy
The RTX 4090 (24GB VRAM) can run the 4B model stack (~18GB). Two options:
**Option A: Real Models Locally**
1. Set `MOCK_MODELS=false` (or omit - defaults to false)
2. Models will download and load (~18GB VRAM)
3. Full inference testing locally
**Option B: Mock Models (faster iteration)**
1. Set `MOCK_MODELS=true` environment variable
2. Mock responses return realistic JSON matching vision output schema (2048-dim embeddings)
3. Test pipeline logic, UI, calculations without real inference
**Deployment:**
1. Deploy to HuggingFace Spaces for production testing
2. Request build logs after deployment to confirm success
3. After changing embedding dimensions, rebuild ChromaDB: `python -m rag.index_builder --rebuild`
## Code Style
- Use `Literal["a", "b", "c"]` unions instead of Enum for simple string choices
- Pydantic models for all input/output validation
- Explicit return types on public functions
- Result types or explicit error returns over thrown exceptions
- Group imports: stdlib β†’ third-party β†’ local
## WSL Note
Dev servers must be exposed for WSL access. Use `--host 0.0.0.0` with Gradio:
```python
app.launch(server_name="0.0.0.0", server_port=7860)
```