MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published 13 days ago • 51
Towards Automated Kernel Generation in the Era of LLMs Paper • 2601.15727 • Published 4 days ago • 15
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 5 days ago • 66