Ray Serve: Tackling the cost and complexity of serving AI in production

Company

Anyscale

Date Published

Sept. 25, 2023

Author

Akshay Malik, Edward Oakes, Phi Nguyen

Word count

2392

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/tackling-the-cost-and-complexity-of-serving-ai-in-production-ray-serve

Summary

Ray Serve and Anyscale Services are now generally available, offering a better way to serve machine learning models that is flexible, performant, and scalable. These solutions aim to solve common challenges in AI application development, such as improving time to market, reducing cost, and ensuring production reliability. Ray Serve provides simplicity, flexibility, and scaling, while Anyscale Services manages deployment infrastructure and integrations, ensuring reliable production deployments with zero-downtime upgrades and canary rollouts. The combination of Ray Serve on Anyscale Services optimizes the full serving stack across model, application, and hardware layers, making it a future-proof solution for AI applications. With its flexibility, scalability, and performance, Ray Serve has already seen significant adoption in various industries, including Ant Group and Samsara, which have improved their production ML pipeline performance and reduced costs. Anyscale Services also supports heterogeneous hardware support, model multiplexing, and request batching, providing cost reductions of 2-3x and better GPU availability. The solution is designed to meet the growing demand for AI applications and provides a managed and production-ready platform for building and deploying AI applications.