/plushcap/analysis/together-ai/together-ai-linearizing-llms-with-lolcats

Linearizing LLMs with LoLCATs

What's this blog post about?

LoLCATs (Low-rank Linear Conversion via Attention Transfer) is a new approach for quickly creating subquadratic LLMs from existing Transformers, focusing on accelerating models and creating fast models more efficiently. The method involves replacing softmax attentions with linear attentions trained to approximate their softmax counterparts ("attention transfer") and adjusting the model by only adjusting with parameter-efficient finetuning (e.g., low-rank adaptation). LoLCATs allows for state-of-the-art linearized quality, drastically reduces linearizing costs, and scales up to 70B and 405B LLMs.

Company
Together AI

Date published
Oct. 14, 2024

Author(s)
Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher RĂ©

Word count
2462

Language
English

Hacker News points
1


By Matt Makai. 2021-2024.