arxiv:2502.07408

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Published on Apr 16

· Submitted by

Moshe kimhi on Apr 20

#1 Paper of the day

NVIDIA

Upvote

Authors:

Moshe Kimhi ,

Abstract

Deep neural networks exhibit catastrophic vulnerability to minimal parameter bit flips across multiple domains, which can be identified and mitigated through targeted protection strategies.

AI-generated summary

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backward pass on random inputs. We show that this vulnerability spans multiple domains, including image classification, object detection, instance segmentation, and reasoning large language models. In image classification, flipping just two sign bits in ResNet-50 on ImageNet reduces accuracy by 99.8%. In object detection and instance segmentation, one or two sign flips in the backbone collapse COCO detection and mask AP for Mask R-CNN and YOLOv8-seg models. In language modeling, two sign flips into different experts reduce Qwen3-30B-A3B-Thinking from 78% to 0% accuracy. We also show that selectively protecting a small fraction of vulnerable sign bits provides a practical defense against such attacks.

View arXiv page View PDF Project page GitHub 20 Add to collection

Community

Kimhi

Paper author Paper submitter 3 days ago

•

edited 2 days ago

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimization-free method
that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backward pass on random inputs. We show that this vulnerability
spans multiple domains, including image classification, object detection and instance segmentation,
and reasoning large language models. In image classification, flipping just two sign bits in ResNet50 on ImageNet reduces accuracy by 99.8%. In object detection and instance segmentation, one
or two sign flips in the backbone collapse COCO detection and mask AP for Mask R-CNN and
YOLOv8-seg models. In language modeling, two sign flips into different experts reduce Qwen3-
30B-A3B-Thinking from 78% to 0% accuracy. We also show that selectively protecting a small
fraction of vulnerable sign bits provides a practical defense against such attacks.

@article{galil2025maximal, title={Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips}, author={Galil, Ido and Kimhi, Moshe and El-Yaniv, Ran}, journal={Transactions on Machine Learning Research}, year={2025}, url={https://arxiv.org/pdf/2502.07408} }

librarian-bot

2 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

jastorj

about 19 hours ago

I remember a paper from Apple that showed that some parameters are "SuperParameters" meaning if we corrupt their values, the model starts to answer nonsense.

Kimhi

Paper author about 3 hours ago

Thanks for sharing, I found the paper (https://arxiv.org/pdf/2411.07191).
We actually had a similar intuition regarding vulnerable bits in 1P-DNL, and we cearfully checked a significant number of models to find the key component for the importance of a weight.

The main difference is that we found second-order gradients to be more significant than high activations, which aligns with existing pruning and quantization research span the last decade, not only on LLMs. We discuss this in more detail in section 3.1 and appendix D (+Table 9) of our work.