OceanBEATs

OceanBEATs is a foundation model for underwater acoustic monitoring, adapted from BEATs via Domain-Adaptive Pretraining (DAPT) on a 5,673-h global ocean soundscape corpus (World-DAPT).

This model serves as the "ears" for underwater soundscapes described in our paper: "Discovery and promotion of unknown sounds into operational detection targets for underwater passive acoustic monitoring under false alarm constraints" (Scientific Reports, revision in review).

About this revision (May 2026). The original December 2026 release used SimCLR/InfoNCE-based DAPT under AMP fp16, which suffered a numerical instability that prevented BEATs encoder weight updates (the beats_dapt_topup_encoder.pt weights were therefore byte-identical to Microsoft's BEATs AS-2M PRETRAIN). The current revision corrects this by re-running DAPT with Masked Audio Modeling (MAM) and a k-means k=1024 tokeniser under bfloat16 precision on a larger 5,673-h World-DAPT corpus. The superseded buggy weights have been removed from this repository (see Reproducibility of the original buggy state below for how to recreate them if needed).

Model Details (current canonical revision)

Architecture: BEATs (Audio Transformer; Microsoft)
Self-supervised pretraining: Masked Audio Modeling (MAM) with k=1024 k-means cluster labels on PRETRAIN BEATs patch features
DAPT corpus: World-DAPT, 5,673 h (NOAA SanctSound, US Navy USWTR, NOAA NRS/ONMS, ICListen / ONC, NPS Glacier Bay, PALAOA Ekström Ice Shelf)
Optimisation: AdamW, batch 16, learning rate 1e-4 (cosine decay), bfloat16 precision, 1 epoch (126,365 steps); checkpoint BEATs_DAPT_MAM_step120000.pt selected by best 56-class SED validation
Input: 16 kHz mono waveform
Backbone init: BEATs AS-2M (iter3+)

Available files (current canonical revision)

File	SHA-256	Size
`beats_dapt_mam_step120000.pt`	`0fe9f7dd92780c2e564f1df06a192482dbcb9a56bdab4202f4d94862b9168f89`	361 MB
`sed_head_56_fulldata_ep8.pt`	`135d11738a6619a57769955468ce5cb6eee3f07044fa45e6c950bf25ac4f8f60`	18 MB

sed_head_56_fulldata_ep8.pt is the 56-class SED head trained on top of the above encoder. Single-seed Event F1 = 0.483; n=10 mean ± std = 0.475 ± 0.017 (per manuscript Table 1 footnote).

Reproducibility of the original (buggy) state

The original December 2026 release contained two files that have been removed in the current revision:

Removed file	Replacement / how to recreate
`beats_dapt_topup_encoder.pt`	Functionally equivalent to Microsoft's BEATs AS-2M PRETRAIN (`BEATs_iter3_plus_AS2M.pt`); use that file directly to reproduce the original buggy DAPT-encoder behaviour.
`sed_head_56_topup_ep8.pt`	Re-train a 56-class SED head on top of the BEATs AS-2M PRETRAIN encoder using the protocol in Methods §4.2.3 of the manuscript (`scripts/train_sed_beats_weak_plus.py` in the GitHub repository).

Removed-file commit history is preserved in the HuggingFace repository commit log; the SHA-256 of the deleted encoder was identical to that of Microsoft's BEATs_iter3_plus_AS2M.pt because the AMP fp16 NaN bug prevented any weight updates.

Usage

These weights are designed to be used with the official code repository:

GitHub Repository: alohajazz/openworld-soundscape-cced2-dgpu

from huggingface_hub import hf_hub_download

# Download canonical revision weights
encoder_path = hf_hub_download(
    repo_id="BiologgingSolutions/OceanBEATs",
    filename="beats_dapt_mam_step120000.pt",
    local_dir="weights/",
)
sed_head_path = hf_hub_download(
    repo_id="BiologgingSolutions/OceanBEATs",
    filename="sed_head_56_fulldata_ep8.pt",
    local_dir="weights/",
)

After download, verify the SHA-256 fingerprints against the values listed above to ensure file integrity.

License & Data Availability

License: CC BY 4.0 (Creative Commons Attribution 4.0 International)

These weights are released under CC BY 4.0 (open, including for commercial use, with attribution). The released weights do not redistribute raw audio.

The internal 56-class SED training corpus itself remains non-public under an in-progress data-sharing agreement within the original collaboration; representative class examples and a per-class taxonomy are planned for release in a forthcoming companion paper.

The Detect-Group-Promote-Union (DGPU) framework and the CCED2 unknownness score (the methods that consume these weights) are subject to patent applications filed by Biologging Solutions Inc.; the CC BY 4.0 license on the released weights does not grant any rights under those patents.

Note: The source code for using these models is released under the MIT License at the GitHub repository linked above.

Citation

If you use this model in your research, please cite our paper:

@article{noda2026discovery,
  title={Discovery and promotion of unknown sounds into operational detection targets for underwater passive acoustic monitoring under false alarm constraints},
  author={Noda, Takuji and Koizumi, Takuya},
  journal={Scientific Reports},
  note={Revision, in review},
  year={2026}
}

Change log

2026-05-09 follow-up: window-aware extraction (no weight changes)

A latent bug in the embedding-extraction script (SegDataset.__getitem__ ignored center_sec, returning per-file constant embeddings) was discovered and fixed on 2026-05-08. The fix affects only the extraction code in the GitHub repository — encoder weights in this repository are byte-identical before and after the fix (the bug occurred downstream of the encoder forward pass). All current-revision result tables (Tables 2/3/4 and Fig 3) were re-computed with the corrected window-aware extractor; updated paper artifacts are tracked under paper_artifacts/winaware_2026-05-09/ and paper_artifacts/supp_table_s3_winaware_2026-05-09/. The HICEAS evaluation set was also reduced from 10 species to 7 to enforce strict 0–8 kHz in-band consistency (Nyquist of the 16-kHz BEATs input); species whose dominant call energy lies above 8 kHz are listed in the GitHub REVISION2.md. SHA-256 fingerprints of beats_dapt_mam_step120000.pt and sed_head_56_fulldata_ep8.pt are unchanged from the current revision listed in the table above.

Current revision (May 2026)

DAPT method changed from SimCLR/InfoNCE to Masked Audio Modeling (MAM) with k-means k=1024 tokeniser; precision changed from AMP fp16 to bfloat16 (corrects the original numerical instability that prevented weight updates)
DAPT corpus expanded from ~4,400 h to 5,673 h
New canonical weights uploaded: beats_dapt_mam_step120000.pt, sed_head_56_fulldata_ep8.pt
Legacy buggy weights (beats_dapt_topup_encoder.pt, sed_head_56_topup_ep8.pt) removed — they were byte-identical to Microsoft BEATs AS-2M PRETRAIN due to the AMP fp16 weight-update failure and provided no information beyond the publicly available PRETRAIN weights

Original release (December 2026)

SimCLR/InfoNCE DAPT on ~4,400-h World-DAPT corpus
Note: the encoder weights were inadvertently equivalent to PRETRAIN due to an AMP fp16 numerical instability discovered in April 2026

Acknowledgements

The base model architecture is based on BEATs (Microsoft). We acknowledge the creators of the BEATs model and the various open-source ocean acoustic datasets (NOAA SanctSound, US Navy USWTR, NOAA NRS/ONMS, ICListen / ONC, NPS Glacier Bay, PALAOA Ekström Ice Shelf) used for DAPT.

Downloads last month: -; Downloads are not tracked for this model. How to track