Instructions to use memo-ozdincer/odile-adapters with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use memo-ozdincer/odile-adapters with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
ODILE adapters
LoRA adapters for ODILE — the orthogonalize / strict-deny endpoint of the ALICE family of weight-level defenses against indirect prompt injection in tool-using LLM agents. ODILE pushes harmful tool-call representations away from a fixed harmful direction (rather than redirecting them onto the benign twin as ALICE does). The result is very strong refusal of injected instructions, at the cost of conservative behavior on benign trajectories.
This repository releases one adapter per backbone:
| Adapter | Base model | LoRA layers |
|---|---|---|
ODILE-Llama-3-1-8B |
meta-llama/Llama-3.1-8B-Instruct |
L12-22 |
ODILE-Llama-3-3-70B (headline) |
meta-llama/Llama-3.3-70B-Instruct |
L30-55 |
ODILE-Qwen-2-5-7B |
Qwen/Qwen2.5-7B-Instruct |
L10-19 |
ODILE-Qwen-2-5-14B |
Qwen/Qwen2.5-14B-Instruct |
L18-33 |
ODILE-Qwen3-32B |
Qwen/Qwen3-32B |
L24-44 |
A matching ODILE-Qwen3-8B adapter was not included in this release; use
the Qwen3-8B ALICE adapter at
memo-ozdincer/alice-adapters/ALICE-Qwen3-8B
for the Qwen3-8B backbone instead.
All adapters are rank-16, alpha-32, target modules q_proj, v_proj, down_proj, up_proj.
When to use ODILE vs. ALICE
| Goal | Use |
|---|---|
| Lowest ASR and preserved utility (recommended) | ALICE (memo-ozdincer/alice-adapters) |
| Strict refusal of any injected instruction | ODILE (this repo) |
On the Llama-3.3-70B headline grid the ALICE orthogonalize / ODILE variant holds ASR at the same near-zero level as ALICE, but utility-under-attack collapses to ~8.3% (versus 41.7% for ALICE and 43.9% for no defense). The adapter is included here for ablation, transparency, and use cases where strict deny is the priority over utility.
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
upstream = "meta-llama/Llama-3.3-70B-Instruct"
base = AutoModelForCausalLM.from_pretrained(upstream, torch_dtype="auto", device_map="auto")
tok = AutoTokenizer.from_pretrained(upstream)
model = PeftModel.from_pretrained(base, "memo-ozdincer/odile-adapters", subfolder="ODILE-Llama-3-3-70B")
Or use the included helper in the GitHub repo:
git clone https://github.com/memo-ozdincer/ODILE
cd ODILE
uv run python scripts/download_adapters.py --adapter ODILE-Llama-3-3-70B
Companion paper & code
- Paper: Weight-Level Defenses Improve LLM Agent Adversarial Robustness (NeurIPS 2026 submission, under review)
- Code: https://github.com/memo-ozdincer/ODILE
License
Apache-2.0 for the LoRA deltas. Redistributed base-model weights remain subject to each upstream model's own license.
- Downloads last month
- -