/plushcap/analysis/together-ai/together-ai-flexgen-high-throughput-generative-inference-of-large-language-models-with-a-single-gpu

FlexGen: High-throughput generative inference of large language models with a single GPU

What's this blog post about?

Company
Together AI

Date published
March 13, 2023

Author(s)
Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher RĂ©, Ion Stoica, Ce Zhang

Word count
317

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.