Flash Attention received the inaugural Stanford Open Source Software award for its significant impact, engagement, and adoption across the industry. FlashAttention is an algorithm that reorders attention computation to speed up Transformer training and inference by reducing memory usage from quadratic to linear in sequence length. Its variants, including FlashAttention-2, offer further improvements with speeds of up to 4x faster training and fine-tuning of Large Language Models (LLMs), achieving 72% model FLOPs utilization for training on NVIDIA A100s. The technology is now widely used by companies and researchers and has been integrated into popular frameworks such as PyTorch and Hugging Face, with its Github repo receiving over 11k stars. FlashAttention-2 is designed as a drop-in replacement for the original algorithm, offering a 2x speedup on core attention operations and achieving further improvements in training Transformers end-to-end.