Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Paper • 2505.06708 • Published May 10, 2025 • 9