NVIDIA Hopper: H100 and FP8 Support

Company

Lambda

Date Published

Dec. 7, 2022

Author

Jeremy Hummel

Word count

1245

Language

English

Hacker News points

None

URL

lambda.ai/blog/nvidia-hopper-h100-and-fp8-support

Summary

NVIDIA's H100 Tensor Core GPU introduces native support for FP8 data types, which offer a significant increase in delivered application performance by 2x and reduce memory requirements by 2x compared to 16-bit floating-point. The proposed FP8 specification includes two formats: E4M3 and E5M2, with varying ranges and precision. FP8 is particularly useful for reducing memory requirements, allowing the training of larger models or decreasing training time, which can result in significant cost savings on cloud-based GPU usage. Additionally, FP8 inference can offer up to 4.5x speedup compared to previous results, while also retaining a higher accuracy compared to INT8 quantization methods like post-training quantization and quantization-aware training.