arxiv:2508.14004

GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks

Published on Aug 19, 2025

Authors:

Abstract

Quantized neural networks are modeled as noisy channels where rounding reduces capacity, and a differentiable approach with learnable parameters identifies quantization bottlenecks while maintaining efficiency.

AI-generated summary

Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the average bit-width decreases and identify resulting quantization bottlenecks by casting fine-tuning as a smooth, constrained optimization problem. Our approach employs a fully differentiable Straight-Through Estimator (STE) with learnable bit-width, noise scale and clamp bounds, and enforces a target bit-width via an exterior-point penalty; mild metric smoothing (via distillation) stabilizes training. Despite its simplicity, the method attains competitive accuracy down to the extreme W1A1 setting while retaining the efficiency of STE.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.14004 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.14004 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.14004 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.