/plushcap/analysis/baseten/baseten-33-faster-llm-inference-with-fp8-quantization

33% faster LLM inference with FP8 quantization

What's this blog post about?

Company
Baseten

Date published
March 14, 2024

Author(s)
Pankaj Gupta, Philip Kiely

Word count
1876

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.