EmoDistill-7b

Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.

arXiv HF Paper GitHub Dataset HF Collection

EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent. It decouples what emotion to show (an IQL emotion selector over a 28-emotion vocabulary) from how to express it (LoRA-SFT imitation followed by JPO refinement against a per-turn LLM judge) β€” both learned from a fixed offline corpus of LLM-vs-LLM negotiations.

This repository hosts all eight model variants from the paper: a full IQL + LoRA-SFT + JPO stack and an emotion-free LoRA-SFT-only baseline, one of each per benchmark domain β€” CRAD, DESRD, SSAD, SSD β€” for direct head-to-head comparison.

EmoDistill workflow

🚧 Status: model card and repository layout live; trained checkpoint weights are uploading rolling. Each domain folder will hold its adapter once final training completes. Subscribe to the repo to be notified.


πŸ“¦ What's in this repo

Every domain comes in two variants:

Variant What it is Folder pattern
EmoDistill (full) β€” IQL + LoRA-SFT + JPO The main method: IQL emotion selector picks the emotion, LoRA-SFT adapter expresses it, JPO refines against an LLM judge. Reported as best in the paper. <domain>/emodistill/
Emotion-free baseline β€” LoRA-SFT only LoRA fine-tune on the same offline corpus without the IQL emotion controller and without the JPO judge loop. Isolates "imitation alone" so you can attribute gains to the emotion control + judge components. <domain>/emotionfree/

Across the four benchmark domains:

Domain Paper acronym EmoDistill (full) Emotion-free baseline
Credit / debt recovery CRAD crad/emodistill/ crad/emotionfree/
Disaster / emergency response DESRD desrd/emodistill/ desrd/emotionfree/
Student bedtime negotiation SSAD ssad/emodistill/ ssad/emotionfree/
Surgical scheduling SSD ssd/emodistill/ ssd/emotionfree/

Inside each emodistill/ subfolder:

  • adapter/ β€” LoRA-SFT+JPO adapter weights (adapter_model.safetensors, adapter_config.json)
  • iql/ β€” IQL emotion selector weights (q_net.pt, v_net.pt, policy.pt)
  • config.json β€” IQL hyperparameters, emotion vocabulary, JPO settings

Inside each emotionfree/ subfolder:

  • adapter/ β€” LoRA-SFT-only adapter weights

πŸ“ Method

EmoDistill composes three offline-trained components at inference (full variant):

  1. IQL emotion selector β€” Implicit Q-Learning over a 28-emotion vocabulary, trained on logged LLM-vs-LLM negotiation trajectories. Picks the emotion to express at each turn.
  2. LoRA-SFT expression imitation β€” LoRA adapter on top of the 7B base, trained by imitation on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned utterances.
  3. JPO (Judge Policy Optimization) β€” PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.

All three components are fully offline β€” no live LLM API at training time after the negotiation log is collected β€” and edge-deployable: at inference, the runtime is a single 7B model with a LoRA adapter (a few hundred MB) plus a small Q-network for emotion selection.

The emotion-free baseline isolates the contribution of the IQL + JPO components by training only the LoRA-SFT step on the same offline turns, with no emotion conditioning and no judge refinement.

πŸš€ Intended use

  • Primary task: emotion-aware negotiation in agent-to-agent settings across the four domains.
  • Deployment: on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
  • Base model: Qwen/Qwen2.5-7B-Instruct for all eight variants. Compatible with both OpenAI and DashScope serving stacks via the LLMClient wrapper in the code repo.

πŸ“Š Evaluation

All eight variants are evaluated on their respective subset of humanlong/emotion-negotiation-benchmarks (100 scenarios per domain). The paper reports identical metrics across the 4 domains for direct comparison.

Companion baselines (same benchmarks, same protocol β€” full numbers in the paper):

Headline result: EmoDistill (full) achieves the highest utility across all four domains, surpassing both vanilla and emotion-free baselines, and outperforming the other emotion-aware methods on edge-deployable 7B compute budgets.

πŸ“¦ Quick start (after checkpoint release)

Loading any variant follows the same pattern β€” just change the subfolder argument:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen2.5-7B-Instruct"
repo = "humanlong/EmoDistill-7b"

# Pick: ("crad" | "desrd" | "ssad" | "ssd") x ("emodistill" | "emotionfree")
domain  = "crad"
variant = "emodistill"          # full IQL + SFT + JPO
# variant = "emotionfree"        # LoRA-SFT-only baseline

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(model, repo, subfolder=f"{domain}/{variant}/adapter")

For the full pipeline (IQL emotion selection β†’ LoRA generation β†’ JPO-refined responses), use the helper code in the EmoDistill GitHub repo:

from emodistill import EmoDistillAgent
agent = EmoDistillAgent.from_pretrained("humanlong/EmoDistill-7b", domain="crad")
reply = agent.respond(conversation_history, opponent_state)

⚠️ Limitations

  • All adapters are trained for English. Cross-lingual transfer is not evaluated.
  • The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
  • Each adapter is domain-specific β€” using crad/emodistill on a disaster scenario will degrade gracefully but is not the recommended use.
  • The model is designed to be persuasive but ethical β€” adversarial use to manipulate vulnerable users (debtors, patients, children, disaster survivors) is out of scope and explicitly discouraged.

πŸ“ License

Apache 2.0 β€” matches the base model.

πŸ“š Citation

@article{long2026emodistill,
  title   = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
  author  = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
  journal = {arXiv preprint arXiv:2605.26785},
  year    = {2026}
}

πŸ”— The full research thread

Work Venue Role
EmoDebt AAMAS 2026 Main Bayesian-optimized emotional intelligence (foundational)
EQ-Negotiator NeurIPS 2025 Personas + HMM + WSLS for SLMs
EvoEmo arXiv preprint Online evolutionary emotion policies
EmoMAS ACL 2026 (top 9%) Bayesian multi-agent orchestration + 4 benchmarks
EmoDistill (this repo) under review Offline distillation: 4 domain models + 4 emotion-free baselines in a 7B SLM

🌟 All five papers + dataset + model in one place: HF Collection β€” Emotion-Aware LLM Negotiation

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for humanlong/EmoDistill-7b

Base model

Qwen/Qwen2.5-7B
Adapter
(2170)
this model

Dataset used to train humanlong/EmoDistill-7b

Collection including humanlong/EmoDistill-7b

Papers for humanlong/EmoDistill-7b