FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
What's this blog post about?
Company
Together AI
Date published
July 11, 2024
Author(s)
Jay Shah (Colfax Research), Ganesh Bikshandi (Colfax Research), Ying Zhang (Meta), Vijay Thakkar (NVIDIA), Pradeep Ramani (NVIDIA), Tri Dao (Princeton University, Together AI)
Word count
1753
Hacker News points
287
Language
English