arxiv:2606.14346

Squeeze-Release: Iterative Pruning with Exact Structural Minimization

Published on Jun 12

· Submitted by

Roman Denkin on Jun 15

Uppsala University

Upvote

Authors:

Roman Denkin ,

Abstract

Squeeze-Release compression method combines pruning with structural minimization to create significantly smaller neural networks while maintaining accuracy, extending to transformer architectures through CompensatedLayerNorm.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Unstructured pruning produces sparse weight tensors, but the standard implementation keeps tensor shapes unchanged so the deployed model is no smaller than before pruning. We present an exact structural rewrite, which we call minimization, that converts a masked network into a smaller dense network with the same forward function up to floating-point rounding. The Squeeze-Release cycle iterates pruning and minimization with an intermediate release step that re-enables the exact-zero positions inside the compacted tensors as small calibrated noise, turning otherwise wasted capacity back into trainable parameters. Successive cycles use that capacity to find structural redundancy a single pass cannot reach. We additionally introduce CompensatedLayerNorm, a function-preserving replacement for LayerNorm that extends minimization to channel reduction across LayerNorm-equipped residual streams. Squeeze-Release compresses the deployable network to 39x smaller than the unpruned model on a fully-connected model network and 14.8x smaller on modern CNN (ConvNeXt-Tiny), at comparable accuracy. In addition we prove that the rewrite can be extended to transformer architectures.

View arXiv page View PDF GitHub 0 Add to collection

Community

gluck3d

Paper author Paper submitter 1 day ago

Neural networks are often far bigger than they need to be, and "pruning" refers to removing the components that add little to a model's performance. The catch: the most common pruning methods report a large amount of disabled parameters, but the model you actually deploy is often no smaller, because the tensors keep their original/dense shape for better hardware compatibility (and this is how it is implemented in default PyTorch).
Our new preprint, "Squeeze-Release: Iterative Pruning with Exact Structural Minimization," closes that gap. We rebuild a pruned network as a genuinely smaller dense one with the same output, then iterate to keep finding redundancy a single pass would miss. In practice this compresses the deployable model up to ~39× on a fully-connected network and ~14.8× on ConvNeXt-Tiny, at comparable accuracy.
We also propose CompensatedLayerNorm - a modified LayerNorm which allows to prune connections going through LayerNorm in function preserving way.

noahml

about 16 hours ago

Neat paper. The idea of turning sparse masked networks into actually smaller dense ones via structural minimization is a nice shift from just having zeros sit in memory. It is interesting to see that the re-introduced noise during the release step actually helps the model find more redundancy in later iterations.

I’m curious, how does the CompensatedLayerNorm handle the math differently to allow for that channel reduction across residual streams?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/6de1fcd5-b453-47df-bf44-c38e1a92d27b

gluck3d

Paper author about 16 hours ago

Thank you for your kind evaluation.
CompensatedLayerNorm does not relate to residual streams concept, it is a way to prune connections between layers through conventional LayerNorm, when number of channels is reduced, in function preserving way with minimal number of additional parameters stored.

librarian-bot

about 3 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.14346

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.14346 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.14346 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.14346 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.