WriteSAE Checkpoints

Sparse autoencoders trained on recurrent state writes. This repository contains the checkpoints and result files for WriteSAE: Sparse Autoencoders for Recurrent State, an anonymous NeurIPS 2026 submission.

The main artifact is a WriteSAE trained on Qwen3.5-0.8B, layer 9, head 4. The repo also includes cross-layer, cross-scale, and cross-architecture checkpoints plus JSON files for the paper's reported tests.

What is here

Item Location Notes
WriteSAE checkpoints writesae/ Primary rank-1 decoder SAEs.
FlatSAE + SVD baselines flat_baseline/ Baselines for cited comparison cells.
Result files results/ JSON outputs grouped by paper claim.
Manifest manifest.json SHA256 hashes, sizes, and descriptions.
Loading example LOAD_EXAMPLE.py Minimal checkpoint-loading script.
Reproducibility card MODEL_CARD.md Reproducibility metadata.

This repository does not include base model weights. Load those from their upstream Hugging Face repositories.

Quickstart

from huggingface_hub import snapshot_download
import torch

ckpt_dir = snapshot_download(
    "anon-writesae/matrix-sae-ckpts",
    allow_patterns=["writesae/qwen0p8b/L9_H4/*"],
)

ckpt = torch.load(
    f"{ckpt_dir}/writesae/qwen0p8b/L9_H4/best.pt",
    weights_only=False,
    map_location="cpu",
)

print(ckpt["config"])
print(ckpt["val_mse"])

# Decoder atom 412, used as the paper's ERASE example.
v_412 = ckpt["sae"].decoder.v[412]   # (d_k,)
w_412 = ckpt["sae"].decoder.w[412]   # (d_v,)
atom = torch.outer(v_412, w_412)      # (d_k, d_v)

For a standalone example, see LOAD_EXAMPLE.py.

Repository layout

matrix-sae-ckpts/
  README.md
  MODEL_CARD.md
  manifest.json
  LOAD_EXAMPLE.py
  LICENSE

  writesae/
    qwen0p8b/L9_H4/        # primary cell
    qwen0p8b/L1_H4/
    qwen0p8b/L17_H4/
    qwen4b/L12_H8/
    qwen27b/L32_H16/
    mamba2-370m/L24_H0/
    rwkv7-1.5b/L12_H0/
    deltanet-1.3b/L12_H8/
    gla-1.3b/L12_H0/

  flat_baseline/
    qwen0p8b_L9_H4/
    mamba2-370m_L24_H0/
    rwkv7-1.5b_L12_H0/

  results/
    92pct_substitution/
    89pct_population_test_L9_H4/
    closed_form_R2_per_layer/
    memory_edit_F412/
    predictive_steering_84pct/
    behavioral_steering_100pct_midrank/
    cross_arch_mamba2_88pct/
    all_16_heads_L9_representativeness/
    falsification/
    ablations/

Included model families

Family Checkpoint path Role
Qwen3.5-0.8B writesae/qwen0p8b/ Main experiments and ablations.
Qwen3.5-4B writesae/qwen4b/L12_H8/ Cross-scale check.
Qwen3.5-27B writesae/qwen27b/L32_H16/ Cross-scale replication.
Mamba-2-370M writesae/mamba2-370m/L24_H0/ Cross-architecture test.
RWKV-7-1.5B writesae/rwkv7-1.5b/L12_H0/ Cross-architecture test.
DeltaNet-1.3B writesae/deltanet-1.3b/L12_H8/ Cross-architecture test.
GLA-1.3B writesae/gla-1.3b/L12_H0/ Cross-architecture test.

Main result files

Result Files
92.4% substitution wins, n=4,851 results/92pct_substitution/
89.8% population test over 87 atoms results/89pct_population_test_L9_H4/
Closed-form factorization, R^2 = 0.98 results/closed_form_R2_per_layer/
F412 ERASE memory edit, -0.116 nats results/memory_edit_F412/
Predictive steering, 84.6% sign agreement results/predictive_steering_84pct/
Behavioral steering, 100% midrank install results/behavioral_steering_100pct_midrank/
Mamba-2 cross-architecture test, 88.08% results/cross_arch_mamba2_88pct/
Null and falsification checks results/falsification/
Appendix ablations results/ablations/

End-to-end reproduction

The code is in the anonymous source repository, separate from this checkpoint repository:

git clone https://anonymous.4open.science/r/WriteSAE-6158
cd WriteSAE-6158
pip install -e .
python scripts/clean_amplified_kl.py   --feature 412   --sae-checkpoint <hf-cache>/writesae/qwen0p8b/L9_H4/best.pt   --states-dir <local>/states/Qwen3.5-0.8B/L9   --out out/headline.json

You will need local activation caches for commands that rerun interventions. This Hugging Face repo stores checkpoints and reported outputs, not the full extracted state cache.

Intended use

Use these files for:

  • mechanistic interpretability research on recurrent state writes;
  • reproducing the paper's SAE substitution and steering analyses;
  • comparing rank-1 decoder SAEs with flat baselines;
  • follow-up work on state-write interventions.

Do not use these files as production model-editing tools or as safety interventions without independent validation.

Limitations

  • The strongest firing-level causal evidence is centered on Qwen3.5-0.8B, layer 9, head 4.
  • Cross-scale and cross-architecture files are included, but they do not make the same per-atom identity claim as the primary cell.
  • SAE atom identity is seed-specific. The paper's transferable claim is about class-level structure, not exact atom matching across runs.
  • Base model licenses and behavior are inherited from the upstream model providers.
  • Several reproduction commands require activation caches that are too large to include here.

License

  • Checkpoints and code in this artifact: MIT, see LICENSE.
  • Base models keep their upstream licenses.
  • No base model weights are redistributed in this repository.

Citation

@inproceedings{anon2026writesae,
  title     = {WriteSAE: Sparse Autoencoders for Recurrent State},
  author    = {Anonymous},
  booktitle = {Submitted to NeurIPS 2026},
  year      = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support