Serving ML Models in Production: Common Patterns

Company

Anyscale

Date Published

Oct. 1, 2021

Author

Simon Mo, Edward Oakes, Michael Galarnyk

Word count

2759

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/serving-ml-models-in-production-common-patterns

Summary

Ray Serve is a web framework specialized for ML model serving that aspires to be easy to use, easy to deploy, and production ready. It provides scalability, multi-model composition, batching, FastAPI integration, and framework-agnostic support. Ray Serve helps with the tradeoff between ease of development and production readiness in the ML serving space by providing a simple and elegant API for deploying and managing ML models. It natively supports online learning, ensemble patterns, business logic patterns, and authentication and input validation. With Ray Serve, you can compose multiple models together, scale out each component individually, and load balance calls across replicas, making it easier to leverage Ray for complex architectures involving many models spanning multiple nodes.