How Ray solves common production challenges for generative AI infrastructure

Company

Anyscale

Date Published

March 20, 2023

Author

Antoni Baum, Eric Liang, Jun Gong, Kai Fricke, Richard Liaw

Word count

1494

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/ray-common-production-challenges-for-generative-ai-infrastructure

Summary

Ray is being used by leading AI organizations to train large language models at scale, including OpenAI, Cohere, EleutherAI, and Alpa, to support the production deployments of generative model workloads. Generative image and language models require significant computational resources, making distributed training and deployment essential for supporting these workloads in production. Ray provides a flexible solution for scaling ML workloads, tackling challenges such as partitioning models across multiple accelerators, setting up training to be tolerant of failures on preemptible instances, and deploying models that span multiple GPUs on multiple nodes. The framework also supports scale-out strategies, where users can run many copies of a workload to deploy an online inference, fine-tuning, or training service at a lower cost than running a single high-end device. Ray Core scheduling enables the orchestration of large-scale distributed computations required for training generative models from scratch, while Ray Train and Ray Serve provide out-of-the-box Trainer classes and APIs for scaling model deployment graphs, respectively. The framework is being enhanced with new features, including streaming batch inference support, async requests in Ray Serve, and integrations with popular frameworks such as HuggingFace Accelerate, DeepSpeed, and Alpa to Train.