Instructions to use zengrh3/wmsteer-a-audioseal with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- AudioSeal
How to use zengrh3/wmsteer-a-audioseal with AudioSeal:
# Watermark Generator from audioseal import AudioSeal model = AudioSeal.load_generator("zengrh3/wmsteer-a-audioseal") # pass a tensor (tensor_wav) of shape (batch, channels, samples) and a sample rate wav, sr = tensor_wav, 16000 watermark = model.get_watermark(wav, sr) watermarked_audio = wav + watermark# Watermark Detector from audioseal import AudioSeal detector = AudioSeal.load_detector("zengrh3/wmsteer-a-audioseal") result, message = detector.detect_watermark(watermarked_audio, sr) - Notebooks
- Google Colab
- Kaggle
WMSteer-A: Auditing AudioSeal in Activation Space
A representation-guided universal removal attack on AudioSeal — and an activation-forensics defense that catches it.
TL;DR AudioSeal's detector encodes "watermarked vs. clean" as a single, perfectly linearly separable concept (linear-probe AUC = 1.000 with 100 probe clips). Contrastive activation analysis + back-projection yields a universal waveform perturbation $\delta_{\rm univ}$ that flips 93% of attacked clips below AudioSeal's default decision threshold at ~18 dB SI-SDR. The attack transfers near-identically across four detectors retrained from the public weights. A small MLP on detector activations (WMShield) detects the attack at TPR@1%FPR = 0.993.
📄 Paper: wmsteer_a.pdf (10 pages: 8 main + 1
appendix + 1 references)
Headline numbers
| Setting | TPR@5%FPR (att) | frac p<0.5 (default rule) | δ SI-SDR |
|---|---|---|---|
| $D_A$ no channel | 0.379 [0.285, 0.465] | 1.00 | 18.1 dB |
| $D_A$ + AAC@64k chan | 0.986 [0.910, 1.000] | 0.93 | — |
| $D_B$ s=111 no channel | 0.07 (TPR@1%FPR) | 1.00 | — |
| $D_B$ s=222 no channel | 0.07 | 1.00 | — |
| $D_B$ s=444 2× LR | 0.075 | 1.00 | — |
WMShield defense: TPR@1%FPR = 0.993, TPR@5%FPR = 1.000.
Method (one figure)
Offline, the attacker watermarks 100 LibriSpeech probe clips with the public AudioSeal generator $G$, hooks the detector $D_A$'s encoder to extract a contrastive watermark direction $\bar v = \mathbb{E}[h_{\rm wm}] - \mathbb{E}[h_{\rm clean}]$, and back-projects it through $D_A$ via Adam to obtain a single universal waveform $\delta_{\rm univ}$ ($|\delta|\infty \le \varepsilon$). At attack time, $\delta{\rm univ}$ is added to any watermarked clip; the verifier's unmodified detector $D_V$ classifies the result as clean.
Repo contents
| Path | What |
|---|---|
wmsteer_a.pdf |
The paper (10 pages, paper/ source available on request). |
figures/ |
12 figures used in the paper (PDF). |
src/ |
Full PyTorch implementation, ROCm-compatible. |
scripts/run.sh |
Wrapper that bakes ROCm/MIOpen environment fixes. |
RESULTS.md |
Aggregated experiment summary. |
LITERATURE_SURVEY.md |
Background literature notes. |
Reproducing
git clone https://github.com/facebookresearch/audioseal
pip install audioseal pesq pyloudnorm soundfile librosa scipy datasets matplotlib
# Download 400 LibriSpeech-test-clean clips
python scripts/fetch_libri.py --n 400
# Run the kill experiment (~1 GPU-hour on 1× MI210 / A100)
scripts/run.sh -m src.block1_kill --probe-n 100 --heldout-n 200 --rank 4 --eps 0.01 --n-steps 600
# Bootstrap CIs + multi-FPR
scripts/run.sh -m src.post_analysis --n-bootstrap 1000
# Cross-detector transfer
scripts/run.sh -m src.block7_transfer
# Baselines (PGD, UAP, FFF, controls)
scripts/run.sh -m src.block5_baselines
# Defense (WMShield)
scripts/run.sh -m src.block6_defense
Key design choices and ROCm notes
- Strip
weight_normparametrization on CPU before moving AudioSeal to GPU. MIOpen 6.3 ongfx90acrashes on weight-norm-wrapped 1D conv. - Disable TorchInductor (
TORCHDYNAMO_DISABLE=1) on ROCm — convs go through a path that miopen kernel cache cannot resolve. - MIOpen RNN backward requires
model.train()even with frozen weights: AudioSeal's encoder uses LSTMs; toggle via context manager. - Redirect MIOpen kernel cache via
MIOPEN_USER_DB_PATH(system path is read-only); symlinkgfx90a*.{tn,ktn}.modelinto the user cache. - All baked into
scripts/run.sh.
Limitations / scope
- White-box on the public AudioSeal generator $G$ only; truly private generators are out of scope.
- AudioSeal 0.2 16-bit, LibriSpeech test-clean only. Cross-corpus and multilingual extensions are open.
- We do not attempt forgery (universal insertion of a specific message); the per-bit message head is multi-axis and structurally distinct.
- WMShield is a non-adaptive defender; the attacker–shield game is unmeasured.
Ethics and disclosure
This is a security audit of a deployed watermarking system using only the publicly released weights and publicly available speech. We do not target any specific deployed service. We disclose findings to the AudioSeal authors prior to publication and propose WMShield as a concrete mitigation.
Citation
@article{wmsteer2026,
title={Auditing AudioSeal in Activation Space:
A Linear Watermark Direction Yields a Universal,
Cross-Detector Removal Perturbation, and Its Defense},
author={Anonymous},
year={2026},
note={Submitted to INTERSPEECH 2027}
}