Linearizing LLMs with LoLCATs
LoLCATs (Low-rank Linear Conversion via Attention Transfer) is a new approach for quickly creating subquadratic LLMs from existing Transformers, focusing on accelerating models and creating fast models more efficiently. The method involves replacing softmax attentions with linear attentions trained to approximate their softmax counterparts ("attention transfer") and adjusting the model by only adjusting with parameter-efficient finetuning (e.g., low-rank adaptation). LoLCATs allows for state-of-the-art linearized quality, drastically reduces linearizing costs, and scales up to 70B and 405B LLMs.
Company
Together AI
Date published
Oct. 14, 2024
Author(s)
Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher RĂ©
Word count
2462
Language
English
Hacker News points
1