/plushcap/analysis/anyscale/anyscale-continuous-batching-llm-inference

How continuous batching enables 23x throughput in LLM inference while reducing p50 latency

What's this blog post about?

Company
Anyscale

Date published
June 22, 2023

Author(s)
Cade Daniel, Chen Shen, Eric Liang, Richard Liaw

Word count
3568

Language
English

Hacker News points
110


By Matt Makai. 2021-2024.