/plushcap/analysis/anyscale/anyscale-continuous-batching-llm-inference

How continuous batching enables 23x throughput in LLM inference while reducing p50 latency

What's this blog post about?

Company
Anyscale

Date published
June 22, 2023

Author(s)
Cade Daniel, Chen Shen, Eric Liang, Richard Liaw

Word count
3568

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.