File size: 5,475 Bytes
b14638e 48db9f2 b14638e 48db9f2 b14638e c1123c1 b14638e 48db9f2 b14638e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | ---
license: mit
library_name: pytorch
pipeline_tag: text-generation
language:
- en
tags:
- knowledge-editing
- model-editing
- selective-prediction
- abstention
- byte-level
- interpretability
- research-prototype
- crud
- editable-llm
- tiny-language-model
- cpu
- knowledge-base
- rome
- memit
---
# Yaz — an editable, auditable tiny knowledge model that abstains when unsure
**Yaz** is a sub-1M-parameter (≈808K), byte-level language model whose individual facts you can
**create, read, update, and delete** one at a time — with **provable per-edit locality** — and that
**abstains** when it isn't confident which fact you mean, instead of guessing. CPU-only, offline.
> **Status: research prototype.** Small-scale and honestly scoped. A clean, reproducible demonstration —
> **not** a state-of-the-art result and **not** a defensible new capability. Read the limitations below.
- 📄 Technical report: [`paper/yaz-technical-report.pdf`](paper/yaz-technical-report.pdf)
- 💻 Code & reproduction: https://github.com/TilelliLab/Yaz
- 🔬 Each fact = one decoder column (*atom*); routing by a frozen `all-MiniLM-L6-v2` embedding.
## How it works
Each fact lives in its own addressable **atom** (a decoder column). A **frozen sentence embedding**
routes a prompt to a fact by meaning, so paraphrases reach the same fact. UPDATE swaps a column,
DELETE zeroes it, CREATE allocates a fresh one, READ is just routing — no retraining. The routing
**confidence margin** (top-1 − top-2) is used as an "I don't know which fact you mean" signal, so the
model refuses low-confidence queries.
## Download
Use the Hugging Face Hub (handles the Git-LFS weights for you). **Don't `git clone` without
`git-lfs`** — you'd get 132-byte LFS pointer files instead of the real `model.safetensors` / `.pt`.
```bash
pip install huggingface_hub
# CPU-only deps (avoids the multi-GB CUDA stack):
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
```
```python
from huggingface_hub import snapshot_download
repo = snapshot_download("TilelliLab/Yaz") # -> local path with all files (real weights)
# then: cd into `repo` and run the snippets below, or `python demo.py --demo`.
```
## Load it (safetensors, no pickle)
```python
# files in this repo: model.safetensors, yaz_meta.json, load_yaz.py, yaz/ (model code)
from load_yaz import load_yaz
model, cfg, meta = load_yaz() # 807,680 params, 50 fact-atoms
print(meta["country_to_target_atom"]["France"]) # -> 0
```
Run the full routing + abstention + live-edit demo (needs `pip install -r requirements.txt`):
```bash
python demo.py --demo
python demo.py --prompt "the country of the Eiffel Tower, its capital is "
python demo.py --prompt "The capital of France is " --edit France=Lima
python demo.py --prompt "best pizza topping?" # -> ABSTAIN (out of scope)
```
The original PyTorch checkpoint is also included at `checkpoints/yaz_gen_semantic_v2.pt` for fidelity;
the `model.safetensors` is the recommended (pickle-free) artifact.
## What it can do (measured)
| Capability | Result |
|---|---|
| UPDATE (edit, no retraining) | in-dist reliability 1.000; edits land 8/8 (first byte) |
| DELETE | fact gone, 0 collateral |
| CREATE | passes 4/4 battery (monosemantic / local / readable / deletable) |
| Per-edit locality | 0/10 collateral; bpc +0.000% across 40 sequential edits |
| Paraphrase-robust routing | held-out reach **0.696** vs 0.216 surface routing |
| Abstain when unsure | near-oracle risk-coverage **AURC 0.004** (oracle 0.003) |
All numbers reproduce with the public `all-MiniLM-L6-v2` embedder (no internal dependencies), seed 2026, CPU.
## How it compares
Yaz sits in the **side-memory** family of editors (like SERAC/GRACE/WISE/PENME), not the weight-editing
family (ROME/MEMIT/MEND). Two honest differences: its edit is a **structural** decoder-column swap (locality
by construction), and it **abstains** on low routing confidence instead of answering with the base model —
a feature no published editor has. On a controlled tiny task, weight-editing (ROME/MEMIT) lands the new
answer but loses selectivity as edits accumulate (22–67% of other facts change), while Yaz holds 0%; but
Yaz is far smaller-scale and its update reliability degrades past its capacity. **Full, honest comparison
(incl. ROME/MEMIT-vs-Yaz at 5/50/200 facts):**
[COMPARISON.md on GitHub](https://github.com/TilelliLab/Yaz/blob/main/COMPARISON.md).
## Limitations (read these)
- **First-byte editor.** Edits set the answer's **first byte**; multi-byte generation is not faithful
(full-word transfer ≈ 0.05).
- **A retracted claim.** An earlier "edit-generalization" headline of 0.675 was **retracted** — a
random-column-swap control sits at ≈ 0.688, i.e. that number was at chance. What survives is routing
*reach*, not an edit-magnitude effect.
- **Fragile routing** on oblique, name-free clues (≈0.85 famous → ≈0.50 oblique).
- **Structural locality** holds only while no two facts share an atom.
- **Tiny, synthetic scope** — 50 country→capital facts, single seed, CPU.
- **Not a moat.** Mechanisms exist in the literature (ROME/MEMIT, GRACE, SERAC, PENME; EasyEdit). Yaz
combines them cleanly and reproducibly — an engineering contribution, not a unique capability.
## Citation
See [`CITATION.cff`](https://github.com/TilelliLab/Yaz/blob/main/CITATION.cff). MIT licensed. © 2026 Tilelli LAB.
|