README.md · TilelliLab/Yaz at main

Yaz / README.md

TilelliLab

Model card: discoverability tags + How-it-compares section + COMPARISON link

48db9f2 verified 3 days ago

preview code

Raw

History Blame Contribute Delete

5.48 kB

	---
	license: mit
	library_name: pytorch
	pipeline_tag: text-generation
	language:
	- en
	tags:
	- knowledge-editing
	- model-editing
	- selective-prediction
	- abstention
	- byte-level
	- interpretability
	- research-prototype
	- crud
	- editable-llm
	- tiny-language-model
	- cpu
	- knowledge-base
	- rome
	- memit
	---

	# Yaz — an editable, auditable tiny knowledge model that abstains when unsure

	Yaz is a sub-1M-parameter (≈808K), byte-level language model whose individual facts you can
	create, read, update, and delete one at a time — with provable per-edit locality — and that
	abstains when it isn't confident which fact you mean, instead of guessing. CPU-only, offline.

	> Status: research prototype. Small-scale and honestly scoped. A clean, reproducible demonstration —
	> not a state-of-the-art result and not a defensible new capability. Read the limitations below.

	- 📄 Technical report: [`paper/yaz-technical-report.pdf`](paper/yaz-technical-report.pdf)
	- 💻 Code & reproduction: https://github.com/TilelliLab/Yaz
	- 🔬 Each fact = one decoder column (atom); routing by a frozen `all-MiniLM-L6-v2` embedding.

	## How it works

	Each fact lives in its own addressable atom (a decoder column). A frozen sentence embedding
	routes a prompt to a fact by meaning, so paraphrases reach the same fact. UPDATE swaps a column,
	DELETE zeroes it, CREATE allocates a fresh one, READ is just routing — no retraining. The routing
	confidence margin (top-1 − top-2) is used as an "I don't know which fact you mean" signal, so the
	model refuses low-confidence queries.

	## Download

	Use the Hugging Face Hub (handles the Git-LFS weights for you). **Don't `git clone` without
	`git-lfs`** — you'd get 132-byte LFS pointer files instead of the real `model.safetensors` / `.pt`.

	```bash
	pip install huggingface_hub
	# CPU-only deps (avoids the multi-GB CUDA stack):
	pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
	```
	```python
	from huggingface_hub import snapshot_download
	repo = snapshot_download("TilelliLab/Yaz") # -> local path with all files (real weights)
	# then: cd into `repo` and run the snippets below, or `python demo.py --demo`.
	```

	## Load it (safetensors, no pickle)

	```python
	# files in this repo: model.safetensors, yaz_meta.json, load_yaz.py, yaz/ (model code)
	from load_yaz import load_yaz
	model, cfg, meta = load_yaz() # 807,680 params, 50 fact-atoms
	print(meta["country_to_target_atom"]["France"]) # -> 0
	```

	Run the full routing + abstention + live-edit demo (needs `pip install -r requirements.txt`):

	```bash
	python demo.py --demo
	python demo.py --prompt "the country of the Eiffel Tower, its capital is "
	python demo.py --prompt "The capital of France is " --edit France=Lima
	python demo.py --prompt "best pizza topping?" # -> ABSTAIN (out of scope)
	```

	The original PyTorch checkpoint is also included at `checkpoints/yaz_gen_semantic_v2.pt` for fidelity;
	the `model.safetensors` is the recommended (pickle-free) artifact.

	## What it can do (measured)

	\| Capability \| Result \|
	\|---\|---\|
	\| UPDATE (edit, no retraining) \| in-dist reliability 1.000; edits land 8/8 (first byte) \|
	\| DELETE \| fact gone, 0 collateral \|
	\| CREATE \| passes 4/4 battery (monosemantic / local / readable / deletable) \|
	\| Per-edit locality \| 0/10 collateral; bpc +0.000% across 40 sequential edits \|
	\| Paraphrase-robust routing \| held-out reach 0.696 vs 0.216 surface routing \|
	\| Abstain when unsure \| near-oracle risk-coverage AURC 0.004 (oracle 0.003) \|

	All numbers reproduce with the public `all-MiniLM-L6-v2` embedder (no internal dependencies), seed 2026, CPU.

	## How it compares

	Yaz sits in the side-memory family of editors (like SERAC/GRACE/WISE/PENME), not the weight-editing
	family (ROME/MEMIT/MEND). Two honest differences: its edit is a structural decoder-column swap (locality
	by construction), and it abstains on low routing confidence instead of answering with the base model —
	a feature no published editor has. On a controlled tiny task, weight-editing (ROME/MEMIT) lands the new
	answer but loses selectivity as edits accumulate (22–67% of other facts change), while Yaz holds 0%; but
	Yaz is far smaller-scale and its update reliability degrades past its capacity. **Full, honest comparison
	(incl. ROME/MEMIT-vs-Yaz at 5/50/200 facts):**
	[COMPARISON.md on GitHub](https://github.com/TilelliLab/Yaz/blob/main/COMPARISON.md).

	## Limitations (read these)

	- First-byte editor. Edits set the answer's first byte; multi-byte generation is not faithful
	(full-word transfer ≈ 0.05).
	- A retracted claim. An earlier "edit-generalization" headline of 0.675 was retracted — a
	random-column-swap control sits at ≈ 0.688, i.e. that number was at chance. What survives is routing
	reach, not an edit-magnitude effect.
	- Fragile routing on oblique, name-free clues (≈0.85 famous → ≈0.50 oblique).
	- Structural locality holds only while no two facts share an atom.
	- Tiny, synthetic scope — 50 country→capital facts, single seed, CPU.
	- Not a moat. Mechanisms exist in the literature (ROME/MEMIT, GRACE, SERAC, PENME; EasyEdit). Yaz
	combines them cleanly and reproducibly — an engineering contribution, not a unique capability.

	## Citation

	See [`CITATION.cff`](https://github.com/TilelliLab/Yaz/blob/main/CITATION.cff). MIT licensed. © 2026 Tilelli LAB.