Introducing Distributed XGBoost Training with Ray

Company

Anyscale

Date Published

June 16, 2021

Author

Kai Fricke, Richard Liaw, Amog Kamsetty

Word count

1994

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/distributed-xgboost-training-with-ray

Summary

XGBoost-Ray` is a novel backend for distributed XGBoost training that leverages Ray to scale XGBoost training from single machines to clusters with hundreds of nodes while minimizing code changes. It enables multi-node and multi-GPU training, supports advanced fault tolerance handling mechanisms, and integrates seamlessly with the hyperparameter optimization library `Ray Tune`. The distributed training setup can be achieved by changing only three lines of code from a single node non-distributed XGBoost setup. The backend also provides a seamless integration with the Scikit-learn API, allowing users to act as a drop-in replacement for models such as `XGBRegressor` or `XGBClassifier`. XGBoost-Ray achieves similar performance as XGBoost-Spark and XGBoost-Dask in terms of speed, scalability, and peak memory utilization. It also offers full GPU support and advanced fault tolerance handling mechanisms. The backend is developed in collaboration with Uber and is available for more information on the GitHub repository.