WriteSAE Checkpoints
Sparse autoencoders trained on recurrent state writes. This repository contains the checkpoints and result files for WriteSAE: Sparse Autoencoders for Recurrent State, an anonymous NeurIPS 2026 submission.
The main artifact is a WriteSAE trained on Qwen3.5-0.8B, layer 9, head 4. The repo also includes cross-layer, cross-scale, and cross-architecture checkpoints plus JSON files for the paper's reported tests.
What is here
| Item | Location | Notes |
|---|---|---|
| WriteSAE checkpoints | writesae/ |
Primary rank-1 decoder SAEs. |
| FlatSAE + SVD baselines | flat_baseline/ |
Baselines for cited comparison cells. |
| Result files | results/ |
JSON outputs grouped by paper claim. |
| Manifest | manifest.json |
SHA256 hashes, sizes, and descriptions. |
| Loading example | LOAD_EXAMPLE.py |
Minimal checkpoint-loading script. |
| Reproducibility card | MODEL_CARD.md |
Reproducibility metadata. |
This repository does not include base model weights. Load those from their upstream Hugging Face repositories.
Quickstart
from huggingface_hub import snapshot_download
import torch
ckpt_dir = snapshot_download(
"anon-writesae/matrix-sae-ckpts",
allow_patterns=["writesae/qwen0p8b/L9_H4/*"],
)
ckpt = torch.load(
f"{ckpt_dir}/writesae/qwen0p8b/L9_H4/best.pt",
weights_only=False,
map_location="cpu",
)
print(ckpt["config"])
print(ckpt["val_mse"])
# Decoder atom 412, used as the paper's ERASE example.
v_412 = ckpt["sae"].decoder.v[412] # (d_k,)
w_412 = ckpt["sae"].decoder.w[412] # (d_v,)
atom = torch.outer(v_412, w_412) # (d_k, d_v)
For a standalone example, see LOAD_EXAMPLE.py.
Repository layout
matrix-sae-ckpts/
README.md
MODEL_CARD.md
manifest.json
LOAD_EXAMPLE.py
LICENSE
writesae/
qwen0p8b/L9_H4/ # primary cell
qwen0p8b/L1_H4/
qwen0p8b/L17_H4/
qwen4b/L12_H8/
qwen27b/L32_H16/
mamba2-370m/L24_H0/
rwkv7-1.5b/L12_H0/
deltanet-1.3b/L12_H8/
gla-1.3b/L12_H0/
flat_baseline/
qwen0p8b_L9_H4/
mamba2-370m_L24_H0/
rwkv7-1.5b_L12_H0/
results/
92pct_substitution/
89pct_population_test_L9_H4/
closed_form_R2_per_layer/
memory_edit_F412/
predictive_steering_84pct/
behavioral_steering_100pct_midrank/
cross_arch_mamba2_88pct/
all_16_heads_L9_representativeness/
falsification/
ablations/
Included model families
| Family | Checkpoint path | Role |
|---|---|---|
| Qwen3.5-0.8B | writesae/qwen0p8b/ |
Main experiments and ablations. |
| Qwen3.5-4B | writesae/qwen4b/L12_H8/ |
Cross-scale check. |
| Qwen3.5-27B | writesae/qwen27b/L32_H16/ |
Cross-scale replication. |
| Mamba-2-370M | writesae/mamba2-370m/L24_H0/ |
Cross-architecture test. |
| RWKV-7-1.5B | writesae/rwkv7-1.5b/L12_H0/ |
Cross-architecture test. |
| DeltaNet-1.3B | writesae/deltanet-1.3b/L12_H8/ |
Cross-architecture test. |
| GLA-1.3B | writesae/gla-1.3b/L12_H0/ |
Cross-architecture test. |
Main result files
| Result | Files |
|---|---|
| 92.4% substitution wins, n=4,851 | results/92pct_substitution/ |
| 89.8% population test over 87 atoms | results/89pct_population_test_L9_H4/ |
| Closed-form factorization, R^2 = 0.98 | results/closed_form_R2_per_layer/ |
| F412 ERASE memory edit, -0.116 nats | results/memory_edit_F412/ |
| Predictive steering, 84.6% sign agreement | results/predictive_steering_84pct/ |
| Behavioral steering, 100% midrank install | results/behavioral_steering_100pct_midrank/ |
| Mamba-2 cross-architecture test, 88.08% | results/cross_arch_mamba2_88pct/ |
| Null and falsification checks | results/falsification/ |
| Appendix ablations | results/ablations/ |
End-to-end reproduction
The code is in the anonymous source repository, separate from this checkpoint repository:
git clone https://anonymous.4open.science/r/WriteSAE-6158
cd WriteSAE-6158
pip install -e .
python scripts/clean_amplified_kl.py --feature 412 --sae-checkpoint <hf-cache>/writesae/qwen0p8b/L9_H4/best.pt --states-dir <local>/states/Qwen3.5-0.8B/L9 --out out/headline.json
You will need local activation caches for commands that rerun interventions. This Hugging Face repo stores checkpoints and reported outputs, not the full extracted state cache.
Intended use
Use these files for:
- mechanistic interpretability research on recurrent state writes;
- reproducing the paper's SAE substitution and steering analyses;
- comparing rank-1 decoder SAEs with flat baselines;
- follow-up work on state-write interventions.
Do not use these files as production model-editing tools or as safety interventions without independent validation.
Limitations
- The strongest firing-level causal evidence is centered on Qwen3.5-0.8B, layer 9, head 4.
- Cross-scale and cross-architecture files are included, but they do not make the same per-atom identity claim as the primary cell.
- SAE atom identity is seed-specific. The paper's transferable claim is about class-level structure, not exact atom matching across runs.
- Base model licenses and behavior are inherited from the upstream model providers.
- Several reproduction commands require activation caches that are too large to include here.
License
- Checkpoints and code in this artifact: MIT, see
LICENSE. - Base models keep their upstream licenses.
- No base model weights are redistributed in this repository.
Citation
@inproceedings{anon2026writesae,
title = {WriteSAE: Sparse Autoencoders for Recurrent State},
author = {Anonymous},
booktitle = {Submitted to NeurIPS 2026},
year = {2026}
}