Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 17
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning mayank-mishra • Jun 11, 2024 • 21