Linearizing LLMs with LoLCATs

Company

Together AI

Date Published

Oct. 14, 2024

Author

Michael Zhang, Simran Arora, Rahul Chalamala, Alan Wu, Benjamin Spector, Aaryan Singhal, Krithik Ramesh, Christopher Ré

Word count

2462

Language

English

Hacker News points

URL

www.together.ai/blog/linearizing-llms-with-lolcats

Summary

LoLCATs (Low-rank Linear Conversion via Attention Transfer) is a new approach for quickly creating subquadratic LLMs from existing Transformers, focusing on accelerating models and creating fast models more efficiently. The method involves replacing softmax attentions with linear attentions trained to approximate their softmax counterparts ("attention transfer") and adjusting the model by only adjusting with parameter-efficient finetuning (e.g., low-rank adaptation). LoLCATs allows for state-of-the-art linearized quality, drastically reduces linearizing costs, and scales up to 70B and 405B LLMs.