MUNIN: Metric-learning Unit for Non-invasive Individual Naming

Lightweight few-shot learning for acoustic individual identification in birds. A ResNet18 encoder trained from scratch on mel spectrograms using episodic prototypical learning.

Key Result

MUNIN (11M parameters, 512-d embeddings) achieves parity with BirdNET (pretrained on 6,000+ species, 1024-d) for individual bird identification:

Setting MUNIN BirdNET Perch
1-shot 85.0% 81.4% 80.4%
3-shot 89.9% 91.5% 90.8%
5-shot 93.9% 94.1% 92.8%

TOST equivalence at 5-shot within +/-2pp margin (p=0.0013). Evaluated on 9 held-out individuals across 3 species.

Checkpoints

File Description
MUNIN flagship -- best 5-shot model (93.9%)
3-shot variant (89.9%)
1-shot variant (85.0%, leads BirdNET)

Usage

Input Format

  • Mono audio at 22050 Hz
  • Mel spectrogram: 128 bins, 1.5s clips (65 frames)
  • Shape:

Training

  • Episodic prototypical learning (5-way 5-shot)
  • 27 training individuals across 3 species (cockatoo, penguin, little owl)
  • 50 epochs, 200 episodes/epoch, cosine annealing LR
  • Trained on consumer GPU (RTX 4070, ~15 min)

Paper

Preprint: MUNIN: Lightweight few-shot learning achieves parity with large pretrained encoders for acoustic individual identification in birds

Citation

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support