MUNIN: Metric-learning Unit for Non-invasive Individual Naming
Lightweight few-shot learning for acoustic individual identification in birds. A ResNet18 encoder trained from scratch on mel spectrograms using episodic prototypical learning.
Key Result
MUNIN (11M parameters, 512-d embeddings) achieves parity with BirdNET (pretrained on 6,000+ species, 1024-d) for individual bird identification:
| Setting | MUNIN | BirdNET | Perch |
|---|---|---|---|
| 1-shot | 85.0% | 81.4% | 80.4% |
| 3-shot | 89.9% | 91.5% | 90.8% |
| 5-shot | 93.9% | 94.1% | 92.8% |
TOST equivalence at 5-shot within +/-2pp margin (p=0.0013). Evaluated on 9 held-out individuals across 3 species.
Checkpoints
| File | Description |
|---|---|
| MUNIN flagship -- best 5-shot model (93.9%) | |
| 3-shot variant (89.9%) | |
| 1-shot variant (85.0%, leads BirdNET) |
Usage
Input Format
- Mono audio at 22050 Hz
- Mel spectrogram: 128 bins, 1.5s clips (65 frames)
- Shape:
Training
- Episodic prototypical learning (5-way 5-shot)
- 27 training individuals across 3 species (cockatoo, penguin, little owl)
- 50 epochs, 200 episodes/epoch, cosine annealing LR
- Trained on consumer GPU (RTX 4070, ~15 min)