WriteSAE Checkpoints

Sparse autoencoders trained on recurrent state writes. This repository contains the checkpoints and result files for WriteSAE: Sparse Autoencoders for Recurrent State, an anonymous NeurIPS 2026 submission.

The main artifact is a WriteSAE trained on Qwen3.5-0.8B, layer 9, head 4. The repo also includes cross-layer, cross-scale, and cross-architecture checkpoints plus JSON files for the paper's reported tests.

What is here

Item	Location	Notes
WriteSAE checkpoints	`writesae/`	Primary rank-1 decoder SAEs.
FlatSAE + SVD baselines	`flat_baseline/`	Baselines for cited comparison cells.
Result files	`results/`	JSON outputs grouped by paper claim.
Manifest	`manifest.json`	SHA256 hashes, sizes, and descriptions.
Loading example	`LOAD_EXAMPLE.py`	Minimal checkpoint-loading script.
Reproducibility card	`MODEL_CARD.md`	Reproducibility metadata.

This repository does not include base model weights. Load those from their upstream Hugging Face repositories.

Quickstart

from huggingface_hub import snapshot_download
import torch

ckpt_dir = snapshot_download(
    "anon-writesae/matrix-sae-ckpts",
    allow_patterns=["writesae/qwen0p8b/L9_H4/*"],
)

ckpt = torch.load(
    f"{ckpt_dir}/writesae/qwen0p8b/L9_H4/best.pt",
    weights_only=False,
    map_location="cpu",
)

print(ckpt["config"])
print(ckpt["val_mse"])

# Decoder atom 412, used as the paper's ERASE example.
v_412 = ckpt["sae"].decoder.v[412]   # (d_k,)
w_412 = ckpt["sae"].decoder.w[412]   # (d_v,)
atom = torch.outer(v_412, w_412)      # (d_k, d_v)

For a standalone example, see LOAD_EXAMPLE.py.

Repository layout

matrix-sae-ckpts/
  README.md
  MODEL_CARD.md
  manifest.json
  LOAD_EXAMPLE.py
  LICENSE

  writesae/
    qwen0p8b/L9_H4/        # primary cell
    qwen0p8b/L1_H4/
    qwen0p8b/L17_H4/
    qwen4b/L12_H8/
    qwen27b/L32_H16/
    mamba2-370m/L24_H0/
    rwkv7-1.5b/L12_H0/
    deltanet-1.3b/L12_H8/
    gla-1.3b/L12_H0/

  flat_baseline/
    qwen0p8b_L9_H4/
    mamba2-370m_L24_H0/
    rwkv7-1.5b_L12_H0/

  results/
    92pct_substitution/
    89pct_population_test_L9_H4/
    closed_form_R2_per_layer/
    memory_edit_F412/
    predictive_steering_84pct/
    behavioral_steering_100pct_midrank/
    cross_arch_mamba2_88pct/
    all_16_heads_L9_representativeness/
    falsification/
    ablations/

Included model families

Family	Checkpoint path	Role
Qwen3.5-0.8B	`writesae/qwen0p8b/`	Main experiments and ablations.
Qwen3.5-4B	`writesae/qwen4b/L12_H8/`	Cross-scale check.
Qwen3.5-27B	`writesae/qwen27b/L32_H16/`	Cross-scale replication.
Mamba-2-370M	`writesae/mamba2-370m/L24_H0/`	Cross-architecture test.
RWKV-7-1.5B	`writesae/rwkv7-1.5b/L12_H0/`	Cross-architecture test.
DeltaNet-1.3B	`writesae/deltanet-1.3b/L12_H8/`	Cross-architecture test.
GLA-1.3B	`writesae/gla-1.3b/L12_H0/`	Cross-architecture test.

Main result files

Result	Files
92.4% substitution wins, n=4,851	`results/92pct_substitution/`
89.8% population test over 87 atoms	`results/89pct_population_test_L9_H4/`
Closed-form factorization, R^2 = 0.98	`results/closed_form_R2_per_layer/`
F412 ERASE memory edit, -0.116 nats	`results/memory_edit_F412/`
Predictive steering, 84.6% sign agreement	`results/predictive_steering_84pct/`
Behavioral steering, 100% midrank install	`results/behavioral_steering_100pct_midrank/`
Mamba-2 cross-architecture test, 88.08%	`results/cross_arch_mamba2_88pct/`
Null and falsification checks	`results/falsification/`
Appendix ablations	`results/ablations/`

End-to-end reproduction

The code is in the anonymous source repository, separate from this checkpoint repository:

git clone https://anonymous.4open.science/r/WriteSAE-6158
cd WriteSAE-6158
pip install -e .
python scripts/clean_amplified_kl.py   --feature 412   --sae-checkpoint <hf-cache>/writesae/qwen0p8b/L9_H4/best.pt   --states-dir <local>/states/Qwen3.5-0.8B/L9   --out out/headline.json

You will need local activation caches for commands that rerun interventions. This Hugging Face repo stores checkpoints and reported outputs, not the full extracted state cache.

Intended use

Use these files for:

mechanistic interpretability research on recurrent state writes;
reproducing the paper's SAE substitution and steering analyses;
comparing rank-1 decoder SAEs with flat baselines;
follow-up work on state-write interventions.

Do not use these files as production model-editing tools or as safety interventions without independent validation.

Limitations

The strongest firing-level causal evidence is centered on Qwen3.5-0.8B, layer 9, head 4.
Cross-scale and cross-architecture files are included, but they do not make the same per-atom identity claim as the primary cell.
SAE atom identity is seed-specific. The paper's transferable claim is about class-level structure, not exact atom matching across runs.
Base model licenses and behavior are inherited from the upstream model providers.
Several reproduction commands require activation caches that are too large to include here.

License

Checkpoints and code in this artifact: MIT, see LICENSE.
Base models keep their upstream licenses.
No base model weights are redistributed in this repository.

Citation

@inproceedings{anon2026writesae,
  title     = {WriteSAE: Sparse Autoencoders for Recurrent State},
  author    = {Anonymous},
  booktitle = {Submitted to NeurIPS 2026},
  year      = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support