Dyuti Dasmahapatra commited on
Commit
a090f9b
Β·
1 Parent(s): 0101a8b

docs: document added models (ResNet, Swin, DeiT, EfficientNet) and EfficientNet fallback

Browse files
Files changed (3) hide show
  1. PROJECT_SUMMARY.md +1 -1
  2. QUICKSTART.md +6 -2
  3. README.md +15 -7
PROJECT_SUMMARY.md CHANGED
@@ -279,7 +279,7 @@ To understand the codebase:
279
 
280
  Things you might want to add later:
281
 
282
- - [ ] More ViT model variants (DeiT, BEiT, Swin)
283
  - [ ] Batch image processing
284
  - [ ] Export results as PDF report
285
  - [ ] Save/load analysis sessions
 
279
 
280
  Things you might want to add later:
281
 
282
+ - [x] More ViT model variants (DeiT, Swin) β€” added ResNet, Swin, DeiT, EfficientNet support in `model_loader.py`
283
  - [ ] Batch image processing
284
  - [ ] Export results as PDF report
285
  - [ ] Save/load analysis sessions
QUICKSTART.md CHANGED
@@ -97,9 +97,13 @@ http://localhost:7860
97
 
98
  ### Step 3: Load a Model
99
 
100
- 1. In the **"Select Model"** dropdown, choose `ViT-Base`
101
  2. Click the **"πŸ”„ Load Model"** button
102
- 3. Wait for the confirmation: `βœ… Model loaded: google/vit-base-patch16-224`
 
 
 
 
103
 
104
  ### Step 4: Analyze Your First Image
105
 
 
97
 
98
  ### Step 3: Load a Model
99
 
100
+ 1. In the **"Select Model"** dropdown, choose a model (examples: `ViT-Base`, `ViT-Large`, `ResNet-50`, `Swin Transformer`, `DeiT`, `EfficientNet`)
101
  2. Click the **"πŸ”„ Load Model"** button
102
+ 3. Wait for the confirmation message, e.g. `βœ… Model loaded: google/vit-base-patch16-224`
103
+
104
+ Notes:
105
+ - For ViT/DeiT models you can use Attention Visualization (patch-level attention maps). For ResNet, Swin, and EfficientNet, use GradCAM or GradientSHAP (the UI will still show options but attention maps are ViT-specific).
106
+ - EfficientNet may fall back to a `timm` loader automatically if the HF download triggers a torch security restriction; no torch upgrade is required.
107
 
108
  ### Step 4: Analyze Your First Image
109
 
README.md CHANGED
@@ -373,16 +373,24 @@ Compares performance across subgroups to identify:
373
 
374
  ---
375
 
376
- ## πŸ”§ Supported Models
377
 
378
- Currently supported Vision Transformer models from Hugging Face:
379
 
380
- | Model | Parameters | Input Size | Accuracy (ImageNet) |
381
- |-------|-----------|------------|---------------------|
382
- | `google/vit-base-patch16-224` | 86M | 224Γ—224 | ~81.3% |
383
- | `google/vit-large-patch16-224` | 304M | 224Γ—224 | ~82.6% |
 
 
 
 
384
 
385
- **Easy to extend**: Add any Hugging Face ViT model to `src/model_loader.py`
 
 
 
 
386
 
387
  ---
388
 
 
373
 
374
  ---
375
 
376
+ ### πŸ”§ Supported Models
377
 
378
+ The dashboard now supports multiple architectures (ViT family and others). The models currently exposed in the UI are:
379
 
380
+ | Display name | Hugging Face ID | Notes |
381
+ |--------------:|-----------------|-------|
382
+ | ViT-Base | `google/vit-base-patch16-224` | ViT β€” attention visualizations and GradCAM supported |
383
+ | ViT-Large | `google/vit-large-patch16-224` | ViT β€” attention visualizations and GradCAM supported |
384
+ | ResNet-50 | `microsoft/resnet-50` | CNN β€” GradCAM supported; attention visualization not applicable |
385
+ | Swin Transformer | `microsoft/swin-base-patch4-window7-224` | Swin β€” GradCAM supported; attention visualization limited to ViT-style models |
386
+ | DeiT | `facebook/deit-base-patch16-224` | ViT-like β€” attention visualizations and GradCAM supported |
387
+ | EfficientNet-B7 | `google/efficientnet-b7` | CNN β€” loaded via Hugging Face when possible; if HF loading triggers a torch.load restriction, the app falls back to `timm` (no torch upgrade required). GradCAM supported; attention visualization not applicable |
388
 
389
+ Notes:
390
+ - Attention visualizations (patch-level attention maps) are meaningful for ViT-style models (ViT, DeiT). For CNNs (ResNet, EfficientNet) and some hierarchical transformers (Swin), the dashboard will use GradCAM or a last-conv fallback instead of patch attention.
391
+ - EfficientNet on the Hugging Face hub can trigger a torch.load security restriction in older torch versions. The toolkit will transparently fall back to a `timm`-based loader to avoid requiring a torch upgrade; this is handled automatically in `src/model_loader.py`.
392
+
393
+ **Easy to extend**: Add more models to `src/model_loader.py` under `SUPPORTED_MODELS` and they will appear in the app dropdown.
394
 
395
  ---
396