Coral Re-Identification Models
Fine-tuned models for underwater coral individual re-identification across multiple years.
This repository contains the two best models and standalone inference scripts. No dependency on the training codebase (coral_reid) is required.
Source code: GitHub
Best Models
Best Accuracy β E3-01b DINOv2 ViT-B/14
| N-Benchmark Top-1 | 86.6% (110/127) |
| Top-3 / Top-5 / Top-10 | 96.9% / 97.6% / 100.0% |
| Avg Rank / Worst Rank | 1.30 / 9 |
| Backbone | DINOv2 ViT-B/14 (86.6M params, timm 518Γ518) |
| Loss | Triplet (margin=0.3) + Hard Mining |
| Sampler | AreaAwareSampler (area_ratio=0.75) |
| Training | 4-phase progressive unfreezing, 56 epochs, ~7.2h |
| Embedding | 1280-d, L2-normalized |
| Files | e3_01b_dinov2_vitb_best/ |
Most Efficient β DINOv3 ViT-S+/16
| N-Benchmark Top-1 | 81.1% (103/127) |
| Top-3 / Top-5 / Top-10 | 92.1% / 95.3% / 99.2% |
| Avg Rank | 1.61 |
| Backbone | DINOv3 ViT-S+/16 (~22M params, timm 512Γ512) |
| Loss | Triplet (margin=0.3) + Hard Mining |
| Sampler | MPerClassSampler (m=2) |
| Training | 4-phase progressive unfreezing, 63 epochs, ~2.0h |
| Embedding | 768-d, L2-normalized |
| Files | dinov3_vitsplus_efficient/ |
Comparison
| Metric | Best Accuracy | Most Efficient | Difference |
|---|---|---|---|
| Top-1 | 86.6% | 81.1% | -5.5% |
| Parameters | ~86.6M | ~22M | -75% |
| Model size | 339 MB | 112 MB | -67% |
| Training time | ~7.2h | ~2.0h | -72% |
| Inference tokens | 1369 (patch14) | 1024 (patch16) | -25% |
Quick Start
# Install dependencies (standalone, no coral_reid needed)
uv sync
# Extract features from a single image
uv run python extract_features.py \
--model e3_01b_dinov2_vitb_best/best_model_20260308_110634.pt \
--input /path/to/image.jpg
# Extract features from a directory
uv run python extract_features.py \
--model e3_01b_dinov2_vitb_best/best_model_20260308_110634.pt \
--input /path/to/images/ \
--output features.h5
N-Benchmark Per-Area Results
E3-01b DINOv2 ViT-B/14 (Best)
| Area | Queries | Top-1 | Top-3 | Top-5 | Avg Rank |
|---|---|---|---|---|---|
| 37 | 32 | 93.8% | 96.9% | 96.9% | 1.28 |
| 38 | 31 | 80.6% | 100.0% | 100.0% | 1.19 |
| 39 | 27 | 85.2% | 92.6% | 96.3% | 1.44 |
| 40 | 37 | 86.5% | 97.3% | 97.3% | 1.30 |
| Overall | 127 | 86.6% | 96.9% | 97.6% | 1.30 |
DINOv3 ViT-S+/16 (Efficient)
| Area | Queries | Top-1 | Top-3 | Top-5 | Avg Rank |
|---|---|---|---|---|---|
| 37 | 32 | 81.2% | 93.8% | 96.9% | 1.56 |
| 38 | 31 | 77.4% | 90.3% | 93.5% | 1.90 |
| 39 | 27 | 85.2% | 92.6% | 96.3% | 1.37 |
| 40 | 37 | 81.1% | 91.9% | 94.6% | 1.57 |
| Overall | 127 | 81.1% | 92.1% | 95.3% | 1.61 |
Full Model History
Model Comparison Table
| Model Name | Arch | Backbone | Loss | Mining | Same Area Neg | Image | Test Acc | Test Loss | Val Loss | N-Bench Avg | A37 | A38 | A39 | A40 | Time |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pre-trained | - | DINOv2-B/14 | - | - | - | bbox | - | - | - | 29.48% | 28.12% | 35.48% | 29.63% | 24.32% | - |
| 20250812_152526 | old | DINOv2-B/14 | Triplet | pre-composed | β | bbox | 92.6% | 0.1659 | - | 48.25% | 50.00% | 51.61% | 48.15% | 43.24% | ~16h |
| 20251007_133126 | old | DINOv2-B/14 | Triplet | pre-composed | β | bbox | 88.8% | 0.2523 | - | 39.32% | 46.88% | 41.94% | 33.33% | 35.14% | ~16h |
| 20251008_094017 | old | DINOv2-B/14 | Triplet | pre-composed | β | bbox | 90.4% | 0.1636 | - | 40.19% | 37.50% | 48.39% | 37.04% | 37.84% | ~16h |
| 20251014_183603 | old | DINOv2-B/14 | Triplet | pre-composed | β | bbox | 92.8% | 0.1012 | - | 40.97% | 37.50% | 38.71% | 44.44% | 43.24% | ~16h |
| Pre-trained | - | DINOv2-B/14 | - | - | - | whole | - | - | - | 50.88% | 34.38% | 54.84% | 62.96% | 51.35% | - |
| 20251015_165008 | old | DINOv2-B/14 | Triplet | pre-composed | β | whole | 92.7% | 0.1330 | 0.1006 | 64.43% | 62.50% | 61.29% | 55.56% | 78.38% | ~16h |
| 20251016_133229 | old | DINOv2-B/14 | Triplet | pre-composed | β | whole | 97.9% | 0.0429 | - | 63.31% | 56.25% | 58.06% | 74.07% | 64.86% | ~16h |
| 20260308_110634 | new | DINOv2-B/14 (timm 518) | Triplet | dynamic (PML) | AreaAware 0.75 | whole | - | - | 0.1604 | 86.6% | 93.8% | 80.6% | 85.2% | 86.5% | ~7.2h |
| 20260306_233824 | new | DINOv3-S+/16 (timm 512) | Triplet | dynamic (PML) | β | whole | - | - | 0.1604 | 81.1% | 81.2% | 77.4% | 85.2% | 81.1% | ~2.0h |
Column Descriptions
| Column | Description |
|---|---|
| Arch | old = old_repo implementation, new = refactored modular architecture |
| Backbone | Feature extractor (DINOv2-B/14, DINOv3-S+/16, etc.) |
| Loss | Loss function (Triplet, ArcFace, CosFace, Circle, Contrastive, etc.) |
| Mining | Sample mining: pre-composed = fixed triplets, dynamic (PML) = MPerClassSampler |
| Same Area Neg | Whether negatives restricted to same geographic area (AreaAware 0.75 = 75% same area) |
| Image | bbox = EXIF bounding box crop, whole = full image |
| Test Acc | Test set accuracy (old arch only, measures pos_dist < neg_dist) |
| Val Loss | Best validation loss during training |
| N-Bench Avg | N-Benchmark Top-1 accuracy averaged across areas 37-40 |
Architecture Differences
| Feature | Old Architecture | New Architecture |
|---|---|---|
| Dataset Output | (anchor, pos, neg) - 3 images |
(image, label) - 1 image |
| Triplet Formation | Pre-composed before training | Dynamic mining per batch |
| Batch Sampler | Random | MPerClassSampler (m=2) |
| Loss Function | Custom TripletLossWithMining | PML TripletMarginLoss |
| Samples per Epoch | ~50,000 triplets x 3 images | ~4,000 images |
| Training Speed | ~23 min/epoch | ~1.5 min/epoch |
| Same Area Negatives | Implemented | Implemented (AreaAwareSampler) |
N-Benchmark (Nearest Benchmark): Top-1 accuracy rate of identifying the correct coral when comparing specimens in areas 37-40 across 2022 and 2023.
Project Structure
coral_models/
βββ pyproject.toml # uv environment (standalone)
βββ extract_features.py # Feature extraction script
βββ e3_01b_dinov2_vitb_best/ # Best accuracy model (86.6%)
β βββ best_model_20260308_110634.pt
β βββ final_model_20260308_110634.pt
β βββ e3_01b_same_area_neg_075.yaml
β βββ README.md
β βββ README(zh-tw).md
βββ dinov3_vitsplus_efficient/ # Most efficient model (81.1%)
β βββ best_model_20260306_233824.pt
β βββ final_model_20260306_233824.pt
β βββ dinov3_vitsplus_tune_02_p2lr5_4ph.yaml
β βββ README.md
β βββ README(zh-tw).md
βββ legacy/ # Old architecture models (torch.hub, 224Γ224)
βββ dinov2_coral_best_model_20251015_165008.pt # 64.43% (old best, same area neg, whole image)
βββ dinov2_coral_best_model_20251016_133229.pt # 63.31% (no same area neg, whole image)
License
This project is licensed under GPL-3.0.
Based on DINOv2 and DINOv3 by Meta Platforms, Inc. (Apache License 2.0).
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support