HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering Paper • 2512.14870 • Published 12 days ago • 12
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28 • 36
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing Paper • 2501.00658 • Published Dec 31, 2024 • 7
Nested Attention: Semantic-aware Attention Values for Concept Personalization Paper • 2501.01407 • Published Jan 2 • 10
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108