What is distributed training?

Company

Anyscale

Date Published

April 26, 2022

Author

Keith Pijanowski, Michael Galarnyk

Word count

727

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/what-is-distributed-training

Summary

Training machine learning models is a slow process that requires running many experiments with different options. Distributed machine learning addresses this problem by parallelizing training models using low-cost infrastructure in a clustered environment. This approach enables model-training time to improve from hours to minutes, and it's made possible by recent advances in distributed computing. Ray Train is a one-stop distributed training toolkit designed with ease of use, workstation friendliness, support for Jupyter Notebooks, fault-tolerance, and easy installation procedures in mind, promising to simplify the process of deploying machine learning models.