NVIDIA A100 GPU Benchmarks for Deep Learning

Company

Lambda

Date Published

May 22, 2020

Author

Stephen Balaban

Word count

1270

Language

English

Hacker News points

None

URL

lambda.ai/blog/nvidia-a100-gpu-deep-learning-benchmarks-and-architectural-overview

Summary

The NVIDIA A100 GPU is expected to provide significant performance gains for deep learning applications, particularly those using FP16 Tensor Cores. When compared to the V100 GPU, the A100 is expected to offer a 1.95x to 2.5x speedup for language model training, with actual performance potentially exceeding 18.1 TFLOPS. The A100's design focuses on maximizing deep learning performance through increased power budget allocation to FP16, Tensor Cores, and other features like sparsity and TF32. The DGX A100 server, featuring up to 8x A100 GPUs, offers superior node-to-node communication bandwidth compared to the DGX-1 or Lambda Hyperplane-8 V100, which may provide better cluster scaling performance. The A100's design also enables a near doubling of FP16 efficiency and represents a significant jump in process node from TSMC 12nm to TSMC 7nm.