Using fractional H100 GPUs for efficient model serving

Company

Baseten

Date Published

March 28, 2024

Author

Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely

Word count

1086

Language

English

Hacker News points

None

URL

www.baseten.co/blog/using-fractional-h100-gpus-for-efficient-model-serving

Summary

The text discusses the use of NVIDIA's Multi-Instance GPU (MIG) feature on H100 GPUs, which allows developers to split a single physical GPU into two or more virtual GPUs, each with its own memory and compute resources. This feature enables efficient model serving for machine learning models by providing equal or better performance compared to A100 GPUs at a 20% lower cost. The fractional H100 GPUs offer advantages such as support for FP8 precision, increased flexibility, and availability of GPUs across cloud providers and regions. The guide provides an overview of how MIG works, the specs of fractional H100 GPUs, and what performance to expect serving models on H100 MIG-based instances.