Multi-GPU enabled BERT using Horovod

Company

Lambda

Date Published

Feb. 6, 2019

Author

Chuan Li

Word count

1022

Language

English

Hacker News points

None

URL

lambda.ai/blog/bert-multi-gpu-implementation-using-tensorflow-and-horovod-with-code

Summary

BERT, a pre-training language representation developed by Google, has achieved state-of-the-art results on various Natural Language Processing tasks. However, its official TPU-friendly implementation only supports single GPU usage at present. This blog post aims to make BERT work with multiple GPUs using Horovod, a framework for parallelizing tasks. The authors have made several changes to the original BERT implementation, including importing Horovod's Tensorflow backend, initializing the library, pinning each worker to a GPU, and adapting gradient clipping accordingly. By leveraging these modifications, the authors demonstrate improved performance on multiple GPUs, with throughput increases ranging from 126.92 examples/sec for sentence classification (2 GPUs) to 231.26 examples/sec for the same task (4 GPUs). The authors also highlight potential pitfalls, such as using unsynchronized models across different workers and adapting gradient clipping accordingly. By following these changes and modifications, developers can adapt BERT for multi-GPU usage, leading to improved performance on various NLP tasks.