NVIDIA's H100 Tensor Core GPU introduces native support for FP8 data types, which offer a significant increase in delivered application performance by 2x and reduce memory requirements by 2x compared to 16-bit floating-point. The proposed FP8 specification includes two formats: E4M3 and E5M2, with varying ranges and precision. FP8 is particularly useful for reducing memory requirements, allowing the training of larger models or decreasing training time, which can result in significant cost savings on cloud-based GPU usage. Additionally, FP8 inference can offer up to 4.5x speedup compared to previous results, while also retaining a higher accuracy compared to INT8 quantization methods like post-training quantization and quantization-aware training.