How continuous batching enables 23x throughput in LLM inference while reducing p50 latency
What's this blog post about?
Company
Anyscale
Date published
June 22, 2023
Author(s)
Cade Daniel, Chen Shen, Eric Liang, Richard Liaw
Word count
3568
Language
English
Hacker News points
110