Content Deep Dive
How continuous batching enables 23x throughput in LLM inference while reducing p50 latency
Company
Anyscale
Date Published
June 22, 2023
Author
Cade Daniel, Chen Shen, Eric Liang, Richard Liaw
Word count
3568
Language
English
Hacker News points
110
URL
www.anyscale.com/blog/continuous-batching-llm-inference
Summary
No summary generated yet.