Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias
Abstract
Machine unlearning faces challenges when models have learned spurious correlations, leading to shortcut unlearning where bias attributes are forgotten instead of class attributes; a new framework called CUPID addresses this by partitioning data based on loss landscape sharpness and disentangling causal and bias pathways in model parameters.
Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.
Community
Excited to share our new paper, "Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias". We identify a critical failure mode in current methods called "shortcut unlearning," where models paradoxically forget bias attributes instead of the target class.
To solve this, we introduce CUPID (Causal Unlearning via Pathway Identification and Disentanglement). By leveraging loss landscape sharpness, CUPID disentangles causal and bias pathways to perform a surgical unlearning update. Our framework achieves state-of-the-art forgetting performance on heavily biased datasets like Waterbirds, BAR, and Biased NICO++. Check out the paper for more details!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper