Parallax: Parameterized Local Linear Attention for Language Modeling Paper • 2605.29157 • Published 10 days ago • 11
Parallax: Parameterized Local Linear Attention for Language Modeling Paper • 2605.29157 • Published 10 days ago • 11
Parallax: Parameterized Local Linear Attention for Language Modeling Paper • 2605.29157 • Published 10 days ago • 11
Attention 0.6B AdamW-WSD training trajectory Collection Per-step record (every 500 steps, 40 ckpts) of the 0.6B Qwen3 softmax-attention baseline trained AdamW + WSD on 80B tokens. • 40 items • Updated 27 days ago