A Gentle Introduction to Multi GPU and Multi Node Distributed Training

Company

Lambda

Date Published

May 31, 2019

Author

Stephen Balaban

Word count

566

Language

English

Hacker News points

None

URL

lambda.ai/blog/introduction-multi-gpu-multi-node-distributed-training-nccl-2-0

Summary

This presentation provides an overview of the different types of training regimes in deep learning, from single GPU to multi-node distributed training. It explains how computation happens, gradient transfers occur, and models are updated and communicated across GPUs and nodes. The presentation also discusses hardware considerations, such as NVLink, InfiniBand networking, and GPUs that support features like GPU Direct RDMA, which enable efficient data transfer between nodes. Specifically, it highlights the benefits of using GPU Direct RDMA for high-speed data transfer, achieving up to 42GB/s bandwidth between nodes, making it suitable for large-scale image, language, and speech models such as NasNet, BERT, and GPT-2.