Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 17
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 Aug 21, 2024 • 42
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 67
view article Article Welcome Falcon Mamba: The first strong attention-free 7B model +4 Aug 12, 2024 • 113