Papers
arxiv:2602.21773

Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

Published on Feb 25
· Submitted by
June Hyoung Kwon
on Mar 4
Authors:
,
,
,
,
,

Abstract

Machine unlearning faces challenges when models have learned spurious correlations, leading to shortcut unlearning where bias attributes are forgotten instead of class attributes; a new framework called CUPID addresses this by partitioning data based on loss landscape sharpness and disentangling causal and bias pathways in model parameters.

AI-generated summary

Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.

Community

Paper submitter

Excited to share our new paper, "Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias". We identify a critical failure mode in current methods called "shortcut unlearning," where models paradoxically forget bias attributes instead of the target class.

To solve this, we introduce CUPID (Causal Unlearning via Pathway Identification and Disentanglement). By leveraging loss landscape sharpness, CUPID disentangles causal and bias pathways to perform a surgical unlearning update. Our framework achieves state-of-the-art forgetting performance on heavily biased datasets like Waterbirds, BAR, and Biased NICO++. Check out the paper for more details!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.21773 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.21773 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.21773 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.