Reproducible Performance Metrics for LLM inference

Company

Anyscale

Date Published

Nov. 1, 2023

Author

Waleed Kadous, Kyle Huang, Wendi Ding, Liguang Xie, Avnish Narayan, Ricky Xu

Word count

2495

Language

English

Hacker News points

URL

www.anyscale.com/blog/reproducible-performance-metrics-for-llm-inference

Summary

Anyscale Endpoints (LLM API Offering) and Private Endpoints are now available as part of the Anyscale Platform. The release of LLMPerf, an open source project for benchmarking LLMs, aims to make claims about LLM performance reproducible by standardizing on key metrics such as latency, throughput, and cost. The benchmarks show that Fireworks.ai and Anyscale Endpoints are viable alternatives, with Anyscale being 15% cheaper and 17% faster than Fireworks in typical workloads. However, the choice of LLM depends on the specific application, with ultra-low latency applications potentially benefiting from Perplexity's open beta, while large workloads may favor Anyscale or Fireworks. The LLMPerf benchmarking tool is available for download and aims to improve transparency and reproducibility in comparing LLM outputs.