Spaces:

KinetoLabs
/

SmokeScan

Paused

App Files Files Community

SmokeScan / CLAUDE.md

KinetoLabs

Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity

14c59e5 4 months ago

preview code

raw

history blame contribute delete

9.12 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	FDAM AI Pipeline - Fire Damage Assessment Methodology v4.0.1 implementation. An AI-powered system that generates professional Cleaning Specifications / Scope of Work documents for fire damage restoration.

	- Deployment: HuggingFace Spaces with Nvidia L4 (22GB VRAM per GPU, single GPU used)
	- Local Dev: RTX 4090 (24GB) - can run 4B model; use mock models for faster iteration
	- Spec Document: `FDAM_AI_Pipeline_Technical_Spec.md` is the authoritative technical reference

	## Critical Constraints

	1. No External API Calls - 100% locally-owned models only (no Claude/OpenAI APIs)
	2. Memory Budget - Single L4 (22GB): ~10GB vision (4B) + ~4GB embedding + ~4GB reranker (~18GB used, ~4GB headroom)
	3. Processing Time - 60-90 seconds per assessment is acceptable
	4. MVP Scope - Phase 1 (PRE) and Phase 2 (PRA) only; no lab results processing yet
	5. Static RAG - Knowledge base is pre-indexed; no user document uploads

	## Tech Stack

	\| Component \| Technology \|
	\|-----------\|------------\|
	\| UI Framework \| Gradio 6.x \|
	\| Vision \| Qwen/Qwen3-VL-4B-Thinking (via vLLM, single GPU) \|
	\| Embeddings \| Qwen/Qwen3-VL-Embedding-2B (2048-dim) \|
	\| Reranker \| Qwen/Qwen3-VL-Reranker-2B \|
	\| Inference \| vLLM (single GPU, no tensor parallelism) \|
	\| Vector Store \| ChromaDB 0.4.x \|
	\| Validation \| Pydantic 2.x \|
	\| PDF Generation \| Pandoc 3.x \|
	\| Package Manager \| pip + requirements.txt \|

	## UI Components (Gradio 6.x)

	Simplified 2-Tab UI: Input + Results/Chat.
	Single-room workflow with integrated chat for Q&A and document modifications.

	### Tab 1: Input
	Uses `gr.Accordion` for collapsible sections:
	- Room Details (open by default): Name, dimensions, ceiling height, facility classification, construction era
	- Images (open by default): Multi-file upload, gallery preview, image count
	- Field Observations (collapsed by default): 15 qualitative observation fields

	### Tab 2: Results + Chat
	- Results Display: Annotated gallery, assessment stats (JSON), SOW document (markdown)
	- Downloads: Markdown and PDF export
	- Chat Interface: Q&A about results, document modifications via `gr.Chatbot(type="messages")`
	- Quick Actions: Pre-defined buttons for common queries

	The frontend uses optimized input components:

	\| Field \| Component \| Notes \|
	\|-------\|-----------\|-------\|
	\| Room Name \| `gr.Textbox` \| Required field \|
	\| Dimensions \| `gr.Number` \| Length, Width in feet \|
	\| Ceiling Height \| `gr.Dropdown` + custom option \| 8-20 ft presets \|
	\| Facility Classification \| `gr.Radio` \| operational, non-operational, public-childcare \|
	\| Construction Era \| `gr.Radio` \| pre-1980, 1980-2000, post-2000 \|
	\| Image Upload \| `gr.Files(file_count="multiple")` \| Batch upload, auto-assigned to room \|
	\| Chat \| `gr.Chatbot(type="messages")` \| Gradio 6 messages format \|

	Keyboard Shortcuts:
	- `Ctrl+1`: Navigate to Input tab
	- `Ctrl+2`: Navigate to Results tab

	## Development Commands

	```sh
	# Install dependencies
	pip install -r requirements.txt

	# Run locally with mock models
	MOCK_MODELS=true python app.py

	# Run with real models (HuggingFace only - requires A100)
	python app.py

	# Recommended tooling (install as dev dependencies)
	ruff check . # Linting
	ruff format . # Formatting
	mypy . # Type checking
	# Note: Tests removed - testing occurs on HuggingFace due to GPU/ChromaDB requirements
	```

	## Architecture

	### 6-Stage Processing Pipeline
	1. Input Validation - Pydantic schema validation (schemas/input.py)
	2. Vision Analysis - Per-image zone/material/condition detection (pipeline/vision.py)
	3. RAG Retrieval - Disposition lookup, thresholds, methods (rag/retriever.py)
	4. FDAM Logic - Disposition matrix application (pipeline/main.py)
	5. Calculations - Surface areas, ACH, labor estimates (pipeline/calculations.py)
	6. Document Generation - SOW, sampling plan, confidence report (pipeline/generator.py)

	### Target Project Structure
	```
	├── app.py # Gradio entry point
	├── config/ # Inference and app settings
	├── models/ # Model loading (mock vs real)
	├── rag/ # Chunking, vectorstore, retrieval
	├── schemas/ # Pydantic input/output models
	├── pipeline/ # Main processing logic + chat handler
	│ └── chat.py # Chat handler for Q&A and document mods
	├── ui/ # Gradio UI components
	│ └── tabs/ # Tab modules
	│ ├── input_tab.py # Combined input (room + images + observations)
	│ └── results_tab.py # Results display + chat interface
	├── RAG-KB/ # Knowledge base source files
	├── chroma_db/ # ChromaDB persistence (generated)
	└── sample_images/ # Sample fire damage images for testing
	```

	## Domain Knowledge

	### Zone Classifications
	- Burn Zone: Direct fire involvement, structural char, exposed/damaged elements
	- Near-Field: Adjacent to burn zone, heavy smoke/heat exposure, visible contamination
	- Far-Field: Smoke migration only, light deposits, no structural damage

	### Condition Levels
	- Background: No visible contamination
	- Light: Faint discoloration, minimal deposits
	- Moderate: Visible film/deposits, surface color altered
	- Heavy: Thick deposits, surface texture obscured
	- Structural Damage: Physical damage requiring repair before cleaning

	### Dispositions (FDAM §4.3)
	- No Action: Document only
	- Clean: Standard cleaning protocol
	- Evaluate: Requires professional judgment
	- Remove: Material must be removed
	- Remove/Repair: Remove and repair/replace

	### Facility Classifications (affects thresholds)
	- Operational: Active workplace (higher thresholds: 500 µg/100cm² lead)
	- Non-Operational: Unoccupied (lower thresholds: 22 µg/100cm² lead)
	- Public/Childcare: Most stringent (EPA/HUD Oct 2024: 0.54 µg/100cm² floors)

	### Key Calculations
	- ACH Formula: `Units = (Volume × 4) / (CFM × 60)` per NADCA ACR 2021
	- Sample Density: Varies by area size per FDAM §2.3
	- Ceiling Deck: Enhanced sampling (1 per 2,500 SF per FDAM §4.5)

	## RAG Knowledge Base

	Source documents in `/RAG-KB/`:
	- FDAM v4.0.1 methodology (primary reference)
	- BNL SOP IH75190 (metals clearance thresholds)
	- IICRC/RIA/CIRI Technical Guide (wildfire restoration)
	- Lab method guides (PLM, ICP-MS)

	Chunking rules:
	- Keep tables intact (never split markdown tables)
	- Preserve headers with content
	- Include metadata (source, category, section)

	## Confidence Framework

	\| Score \| Level \| Action \|
	\|-------\|-------\|--------\|
	\| ≥90% \| Very High \| Accept without review \|
	\| 70-89% \| High \| Accept, note in report \|
	\| 50-69% \| Moderate \| Flag for human review \|
	\| <50% \| Low \| Require human verification \|

	## Model Loading

	All 3 models are loaded at startup (~18GB total on single L4 GPU):

	```python
	from vllm import LLM, SamplingParams

	# Vision model via vLLM (single GPU, no tensor parallelism)
	vision_model = LLM(
	model="Qwen/Qwen3-VL-4B-Thinking",
	tensor_parallel_size=1, # Single GPU
	trust_remote_code=True,
	gpu_memory_utilization=0.80,
	max_model_len=16384,
	)

	# Embedding and Reranker use official Qwen3VL loaders
	from scripts.qwen3_vl import Qwen3VLEmbedder, Qwen3VLReranker
	embedding_model = Qwen3VLEmbedder("Qwen/Qwen3-VL-Embedding-2B", torch_dtype=torch.bfloat16)
	reranker_model = Qwen3VLReranker("Qwen/Qwen3-VL-Reranker-2B", torch_dtype=torch.bfloat16)
	```

	Expected memory usage (~18GB total on single L4):
	- Vision model (4B BF16): ~10GB
	- Embedding model (2B): ~4GB
	- Reranker model (2B): ~4GB
	- Headroom: ~4GB for KV cache and overhead

	## Local Development Strategy

	The RTX 4090 (24GB VRAM) can run the 4B model stack (~18GB). Two options:

	Option A: Real Models Locally
	1. Set `MOCK_MODELS=false` (or omit - defaults to false)
	2. Models will download and load (~18GB VRAM)
	3. Full inference testing locally

	Option B: Mock Models (faster iteration)
	1. Set `MOCK_MODELS=true` environment variable
	2. Mock responses return realistic JSON matching vision output schema (2048-dim embeddings)
	3. Test pipeline logic, UI, calculations without real inference

	Deployment:
	1. Deploy to HuggingFace Spaces for production testing
	2. Request build logs after deployment to confirm success
	3. After changing embedding dimensions, rebuild ChromaDB: `python -m rag.index_builder --rebuild`

	## Code Style

	- Use `Literal["a", "b", "c"]` unions instead of Enum for simple string choices
	- Pydantic models for all input/output validation
	- Explicit return types on public functions
	- Result types or explicit error returns over thrown exceptions
	- Group imports: stdlib → third-party → local

	## WSL Note

	Dev servers must be exposed for WSL access. Use `--host 0.0.0.0` with Gradio:
	```python
	app.launch(server_name="0.0.0.0", server_port=7860)
	```