Company
Date Published
Author
Kai Fricke, Richard Liaw, Amog Kamsetty
Word count
1994
Language
English
Hacker News points
None

Summary

XGBoost-Ray` is a novel backend for distributed XGBoost training that leverages Ray to scale XGBoost training from single machines to clusters with hundreds of nodes while minimizing code changes. It enables multi-node and multi-GPU training, supports advanced fault tolerance handling mechanisms, and integrates seamlessly with the hyperparameter optimization library `Ray Tune`. The distributed training setup can be achieved by changing only three lines of code from a single node non-distributed XGBoost setup. The backend also provides a seamless integration with the Scikit-learn API, allowing users to act as a drop-in replacement for models such as `XGBRegressor` or `XGBClassifier`. XGBoost-Ray achieves similar performance as XGBoost-Spark and XGBoost-Dask in terms of speed, scalability, and peak memory utilization. It also offers full GPU support and advanced fault tolerance handling mechanisms. The backend is developed in collaboration with Uber and is available for more information on the GitHub repository.