Serving PyTorch models with FastAPI and Ray Serve

Company

Anyscale

Date Published

Feb. 23, 2022

Author

Simon Mo, Chandler Gibbons

Word count

1506

Language

English

Hacker News points

URL

www.anyscale.com/blog/serving-pytorch-models-with-fastapi-and-ray-serve

Summary

Deploying a machine learning model can be challenging depending on the deployment platform and tools used to serve it into production, so selecting the proper tools and best platform is crucial for effective serving and deployment of models. Various frameworks such as TorchServe, Flask, FastAPI, and Ray Serve are available for serving PyTorch models, each with its pros and cons. TorchServe is a flexible tool developed by PyTorch for serving PyTorch and Torch-scripted models but has drawbacks like frequent updates and being Java-dependent. Cloud-hosted solutions like Amazon SageMaker can be powerful but expensive, while web-based frameworks like Flask are efficient but may present scaling challenges. Ray Serve, on the other hand, is a library that runs on top of the Ray Distributed Library Ecosystem, providing a simple web server for production deployments with end-to-end control over the request lifecycle and allowing models to scale independently. The integration of FastAPI and Ray Serve can help in scaling PyTorch model serving API across a Ray cluster, improving the number of requests served per second and reducing latency. With this setup, deploying and serving scalable machine learning models into production becomes more manageable, offering various tools and frameworks to meet different needs.