ODILE adapters

LoRA adapters for ODILE — the orthogonalize / strict-deny endpoint of the ALICE family of weight-level defenses against indirect prompt injection in tool-using LLM agents. ODILE pushes harmful tool-call representations away from a fixed harmful direction (rather than redirecting them onto the benign twin as ALICE does). The result is very strong refusal of injected instructions, at the cost of conservative behavior on benign trajectories.

This repository releases one adapter per backbone:

Adapter	Base model	LoRA layers
`ODILE-Llama-3-1-8B`	`meta-llama/Llama-3.1-8B-Instruct`	L12-22
`ODILE-Llama-3-3-70B` (headline)	`meta-llama/Llama-3.3-70B-Instruct`	L30-55
`ODILE-Qwen-2-5-7B`	`Qwen/Qwen2.5-7B-Instruct`	L10-19
`ODILE-Qwen-2-5-14B`	`Qwen/Qwen2.5-14B-Instruct`	L18-33
`ODILE-Qwen3-32B`	`Qwen/Qwen3-32B`	L24-44

A matching ODILE-Qwen3-8B adapter was not included in this release; use the Qwen3-8B ALICE adapter at memo-ozdincer/alice-adapters/ALICE-Qwen3-8B for the Qwen3-8B backbone instead.

All adapters are rank-16, alpha-32, target modules q_proj, v_proj, down_proj, up_proj.

When to use ODILE vs. ALICE

Goal	Use
Lowest ASR and preserved utility (recommended)	ALICE (`memo-ozdincer/alice-adapters`)
Strict refusal of any injected instruction	ODILE (this repo)

On the Llama-3.3-70B headline grid the ALICE orthogonalize / ODILE variant holds ASR at the same near-zero level as ALICE, but utility-under-attack collapses to ~8.3% (versus 41.7% for ALICE and 43.9% for no defense). The adapter is included here for ablation, transparency, and use cases where strict deny is the priority over utility.

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

upstream = "meta-llama/Llama-3.3-70B-Instruct"
base = AutoModelForCausalLM.from_pretrained(upstream, torch_dtype="auto", device_map="auto")
tok  = AutoTokenizer.from_pretrained(upstream)
model = PeftModel.from_pretrained(base, "memo-ozdincer/odile-adapters", subfolder="ODILE-Llama-3-3-70B")

Or use the included helper in the GitHub repo:

git clone https://github.com/memo-ozdincer/ODILE
cd ODILE
uv run python scripts/download_adapters.py --adapter ODILE-Llama-3-3-70B

Companion paper & code

Paper: Weight-Level Defenses Improve LLM Agent Adversarial Robustness (NeurIPS 2026 submission, under review)
Code: https://github.com/memo-ozdincer/ODILE

License

Apache-2.0 for the LoRA deltas. Redistributed base-model weights remain subject to each upstream model's own license.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support