EmoDistill-7b

Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.

EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent. It decouples what emotion to show (an IQL emotion selector over a 28-emotion vocabulary) from how to express it (LoRA-SFT imitation followed by JPO refinement against a per-turn LLM judge) — both learned from a fixed offline corpus of LLM-vs-LLM negotiations.

This repository hosts all eight model variants from the paper: a full IQL + LoRA-SFT + JPO stack and an emotion-free LoRA-SFT-only baseline, one of each per benchmark domain — CRAD, DESRD, SSAD, SSD — for direct head-to-head comparison.

🚧 Status: model card and repository layout live; trained checkpoint weights are uploading rolling. Each domain folder will hold its adapter once final training completes. Subscribe to the repo to be notified.

📦 What's in this repo

Every domain comes in two variants:

Variant	What it is	Folder pattern
EmoDistill (full) — IQL + LoRA-SFT + JPO	The main method: IQL emotion selector picks the emotion, LoRA-SFT adapter expresses it, JPO refines against an LLM judge. Reported as best in the paper.	`<domain>/emodistill/`
Emotion-free baseline — LoRA-SFT only	LoRA fine-tune on the same offline corpus without the IQL emotion controller and without the JPO judge loop. Isolates "imitation alone" so you can attribute gains to the emotion control + judge components.	`<domain>/emotionfree/`

Across the four benchmark domains:

Domain	Paper acronym	EmoDistill (full)	Emotion-free baseline
Credit / debt recovery	CRAD	`crad/emodistill/`	`crad/emotionfree/`
Disaster / emergency response	DESRD	`desrd/emodistill/`	`desrd/emotionfree/`
Student bedtime negotiation	SSAD	`ssad/emodistill/`	`ssad/emotionfree/`
Surgical scheduling	SSD	`ssd/emodistill/`	`ssd/emotionfree/`

Inside each emodistill/ subfolder:

adapter/ — LoRA-SFT+JPO adapter weights (adapter_model.safetensors, adapter_config.json)
iql/ — IQL emotion selector weights (q_net.pt, v_net.pt, policy.pt)
config.json — IQL hyperparameters, emotion vocabulary, JPO settings

Inside each emotionfree/ subfolder:

adapter/ — LoRA-SFT-only adapter weights

📐 Method

EmoDistill composes three offline-trained components at inference (full variant):

IQL emotion selector — Implicit Q-Learning over a 28-emotion vocabulary, trained on logged LLM-vs-LLM negotiation trajectories. Picks the emotion to express at each turn.
LoRA-SFT expression imitation — LoRA adapter on top of the 7B base, trained by imitation on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned utterances.
JPO (Judge Policy Optimization) — PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.

All three components are fully offline — no live LLM API at training time after the negotiation log is collected — and edge-deployable: at inference, the runtime is a single 7B model with a LoRA adapter (a few hundred MB) plus a small Q-network for emotion selection.

The emotion-free baseline isolates the contribution of the IQL + JPO components by training only the LoRA-SFT step on the same offline turns, with no emotion conditioning and no judge refinement.

🚀 Intended use

Primary task: emotion-aware negotiation in agent-to-agent settings across the four domains.
Deployment: on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
Base model: Qwen/Qwen2.5-7B-Instruct for all eight variants. Compatible with both OpenAI and DashScope serving stacks via the LLMClient wrapper in the code repo.

📊 Evaluation

All eight variants are evaluated on their respective subset of humanlong/emotion-negotiation-benchmarks (100 scenarios per domain). The paper reports identical metrics across the 4 domains for direct comparison.

Companion baselines (same benchmarks, same protocol — full numbers in the paper):

EmoDebt (AAMAS 2026 Main, arXiv:2503.21080) — Bayesian-optimized emotional intelligence engine.
EQ-Negotiator (NeurIPS 2025, arXiv:2511.03370) — persona + HMM + WSLS, learning-free.
EvoEmo (arXiv:2509.04310) — online evolutionary emotion policies.
EmoMAS (ACL 2026 Main, top 9%, arXiv:2604.07003) — Bayesian multi-agent orchestration, no pre-training.
Vanilla 7B (no adapter, no emotion guidance).

Headline result: EmoDistill (full) achieves the highest utility across all four domains, surpassing both vanilla and emotion-free baselines, and outperforming the other emotion-aware methods on edge-deployable 7B compute budgets.

📦 Quick start (after checkpoint release)

Loading any variant follows the same pattern — just change the subfolder argument:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen2.5-7B-Instruct"
repo = "humanlong/EmoDistill-7b"

# Pick: ("crad" | "desrd" | "ssad" | "ssd") x ("emodistill" | "emotionfree")
domain  = "crad"
variant = "emodistill"          # full IQL + SFT + JPO
# variant = "emotionfree"        # LoRA-SFT-only baseline

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(model, repo, subfolder=f"{domain}/{variant}/adapter")

For the full pipeline (IQL emotion selection → LoRA generation → JPO-refined responses), use the helper code in the EmoDistill GitHub repo:

from emodistill import EmoDistillAgent
agent = EmoDistillAgent.from_pretrained("humanlong/EmoDistill-7b", domain="crad")
reply = agent.respond(conversation_history, opponent_state)

⚠️ Limitations

All adapters are trained for English. Cross-lingual transfer is not evaluated.
The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
Each adapter is domain-specific — using crad/emodistill on a disaster scenario will degrade gracefully but is not the recommended use.
The model is designed to be persuasive but ethical — adversarial use to manipulate vulnerable users (debtors, patients, children, disaster survivors) is out of scope and explicitly discouraged.

📝 License

Apache 2.0 — matches the base model.

📚 Citation

@article{long2026emodistill,
  title   = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
  author  = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
  journal = {arXiv preprint arXiv:2605.26785},
  year    = {2026}
}

🔗 The full research thread

Work	Venue	Role
EmoDebt	AAMAS 2026 Main	Bayesian-optimized emotional intelligence (foundational)
EQ-Negotiator	NeurIPS 2025	Personas + HMM + WSLS for SLMs
EvoEmo	arXiv preprint	Online evolutionary emotion policies
EmoMAS	ACL 2026 (top 9%)	Bayesian multi-agent orchestration + 4 benchmarks
EmoDistill (this repo)	under review	Offline distillation: 4 domain models + 4 emotion-free baselines in a 7B SLM

🌟 All five papers + dataset + model in one place: HF Collection — Emotion-Aware LLM Negotiation

Downloads last month: -

Model tree for humanlong/EmoDistill-7b

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2170)

this model

Dataset used to train humanlong/EmoDistill-7b

Collection including humanlong/EmoDistill-7b

Emotion-Aware LLM Negotiation

Collection

Personas, online policy evolution, multi-agent orchestration, and offline distillation for emotion-aware LLM negotiation agents. • 8 items • Updated 2 days ago

Papers for humanlong/EmoDistill-7b

EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation

Paper • 2605.26785 • Published 15 days ago

EmoMAS: Emotion-Aware Multi-Agent System for High-Stakes Edge-Deployable Negotiation with Bayesian Orchestration

Paper • 2604.07003 • Published Apr 12

EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation

Paper • 2511.03370 • Published Nov 5, 2025

EvoEmo: Towards Evolved Emotional Policies for Adversarial LLM Agents in Multi-Turn Price Negotiation

Paper • 2509.04310 • Published Oct 13, 2025

EQ-Negotiator: An Emotion-Reasoning LLM Agent in Credit Dialogues

Paper • 2503.21080 • Published Mar 27, 2025 • 1