OceanBEATs
OceanBEATs is a foundation model for underwater acoustic monitoring, adapted from BEATs via Domain-Adaptive Pretraining (DAPT) on a 5,673-h global ocean soundscape corpus (World-DAPT).
This model serves as the "ears" for underwater soundscapes described in our paper: "Discovery and promotion of unknown sounds into operational detection targets for underwater passive acoustic monitoring under false alarm constraints" (Scientific Reports, revision in review).
About this revision (May 2026). The original December 2026 release used SimCLR/InfoNCE-based DAPT under AMP fp16, which suffered a numerical instability that prevented BEATs encoder weight updates (the
beats_dapt_topup_encoder.ptweights were therefore byte-identical to Microsoft's BEATs AS-2M PRETRAIN). The current revision corrects this by re-running DAPT with Masked Audio Modeling (MAM) and a k-means k=1024 tokeniser under bfloat16 precision on a larger 5,673-h World-DAPT corpus. The superseded buggy weights have been removed from this repository (see Reproducibility of the original buggy state below for how to recreate them if needed).
Model Details (current canonical revision)
- Architecture: BEATs (Audio Transformer; Microsoft)
- Self-supervised pretraining: Masked Audio Modeling (MAM) with k=1024 k-means cluster labels on PRETRAIN BEATs patch features
- DAPT corpus: World-DAPT, 5,673 h (NOAA SanctSound, US Navy USWTR, NOAA NRS/ONMS, ICListen / ONC, NPS Glacier Bay, PALAOA Ekström Ice Shelf)
- Optimisation: AdamW, batch 16, learning rate 1e-4 (cosine decay),
bfloat16 precision, 1 epoch (126,365 steps); checkpoint
BEATs_DAPT_MAM_step120000.ptselected by best 56-class SED validation - Input: 16 kHz mono waveform
- Backbone init: BEATs AS-2M (iter3+)
Available files (current canonical revision)
| File | SHA-256 | Size |
|---|---|---|
beats_dapt_mam_step120000.pt |
0fe9f7dd92780c2e564f1df06a192482dbcb9a56bdab4202f4d94862b9168f89 |
361 MB |
sed_head_56_fulldata_ep8.pt |
135d11738a6619a57769955468ce5cb6eee3f07044fa45e6c950bf25ac4f8f60 |
18 MB |
sed_head_56_fulldata_ep8.pt is the 56-class SED head trained on top of the
above encoder. Single-seed Event F1 = 0.483; n=10 mean ± std = 0.475 ± 0.017
(per manuscript Table 1 footnote).
Reproducibility of the original (buggy) state
The original December 2026 release contained two files that have been removed in the current revision:
| Removed file | Replacement / how to recreate |
|---|---|
beats_dapt_topup_encoder.pt |
Functionally equivalent to Microsoft's BEATs AS-2M PRETRAIN (BEATs_iter3_plus_AS2M.pt); use that file directly to reproduce the original buggy DAPT-encoder behaviour. |
sed_head_56_topup_ep8.pt |
Re-train a 56-class SED head on top of the BEATs AS-2M PRETRAIN encoder using the protocol in Methods §4.2.3 of the manuscript (scripts/train_sed_beats_weak_plus.py in the GitHub repository). |
Removed-file commit history is preserved in the HuggingFace repository
commit log; the SHA-256 of the deleted encoder was identical to that of
Microsoft's BEATs_iter3_plus_AS2M.pt because the AMP fp16 NaN bug
prevented any weight updates.
Usage
These weights are designed to be used with the official code repository:
GitHub Repository: alohajazz/openworld-soundscape-cced2-dgpu
from huggingface_hub import hf_hub_download
# Download canonical revision weights
encoder_path = hf_hub_download(
repo_id="BiologgingSolutions/OceanBEATs",
filename="beats_dapt_mam_step120000.pt",
local_dir="weights/",
)
sed_head_path = hf_hub_download(
repo_id="BiologgingSolutions/OceanBEATs",
filename="sed_head_56_fulldata_ep8.pt",
local_dir="weights/",
)
After download, verify the SHA-256 fingerprints against the values listed above to ensure file integrity.
License & Data Availability
License: CC BY 4.0 (Creative Commons Attribution 4.0 International)
These weights are released under CC BY 4.0 (open, including for commercial use, with attribution). The released weights do not redistribute raw audio.
The internal 56-class SED training corpus itself remains non-public under an in-progress data-sharing agreement within the original collaboration; representative class examples and a per-class taxonomy are planned for release in a forthcoming companion paper.
The Detect-Group-Promote-Union (DGPU) framework and the CCED2 unknownness score (the methods that consume these weights) are subject to patent applications filed by Biologging Solutions Inc.; the CC BY 4.0 license on the released weights does not grant any rights under those patents.
Note: The source code for using these models is released under the MIT License at the GitHub repository linked above.
Citation
If you use this model in your research, please cite our paper:
@article{noda2026discovery,
title={Discovery and promotion of unknown sounds into operational detection targets for underwater passive acoustic monitoring under false alarm constraints},
author={Noda, Takuji and Koizumi, Takuya},
journal={Scientific Reports},
note={Revision, in review},
year={2026}
}
Change log
2026-05-09 follow-up: window-aware extraction (no weight changes)
A latent bug in the embedding-extraction script (SegDataset.__getitem__
ignored center_sec, returning per-file constant embeddings) was discovered
and fixed on 2026-05-08. The fix affects only the extraction code in the
GitHub repository — encoder weights in this repository are byte-identical
before and after the fix (the bug occurred downstream of the encoder
forward pass). All current-revision result tables (Tables 2/3/4 and Fig 3)
were re-computed with the corrected window-aware extractor; updated paper
artifacts are tracked under
paper_artifacts/winaware_2026-05-09/
and
paper_artifacts/supp_table_s3_winaware_2026-05-09/.
The HICEAS evaluation set was also reduced from 10 species to 7 to enforce
strict 0–8 kHz in-band consistency (Nyquist of the 16-kHz BEATs input);
species whose dominant call energy lies above 8 kHz are listed in the
GitHub REVISION2.md. SHA-256 fingerprints of
beats_dapt_mam_step120000.pt and sed_head_56_fulldata_ep8.pt are
unchanged from the current revision listed in the table above.
Current revision (May 2026)
- DAPT method changed from SimCLR/InfoNCE to Masked Audio Modeling (MAM) with k-means k=1024 tokeniser; precision changed from AMP fp16 to bfloat16 (corrects the original numerical instability that prevented weight updates)
- DAPT corpus expanded from ~4,400 h to 5,673 h
- New canonical weights uploaded:
beats_dapt_mam_step120000.pt,sed_head_56_fulldata_ep8.pt - Legacy buggy weights (
beats_dapt_topup_encoder.pt,sed_head_56_topup_ep8.pt) removed — they were byte-identical to Microsoft BEATs AS-2M PRETRAIN due to the AMP fp16 weight-update failure and provided no information beyond the publicly available PRETRAIN weights
Original release (December 2026)
- SimCLR/InfoNCE DAPT on ~4,400-h World-DAPT corpus
- Note: the encoder weights were inadvertently equivalent to PRETRAIN due to an AMP fp16 numerical instability discovered in April 2026
Acknowledgements
The base model architecture is based on BEATs (Microsoft). We acknowledge the creators of the BEATs model and the various open-source ocean acoustic datasets (NOAA SanctSound, US Navy USWTR, NOAA NRS/ONMS, ICListen / ONC, NPS Glacier Bay, PALAOA Ekström Ice Shelf) used for DAPT.