Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference Paper • 2604.07394 • Published 6 days ago • 14
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 8 days ago • 105