In this blog post, we explored the options for deploying machine learning (ML) models in production. Python has become a popular language for data science, but generic Python web servers such as FastAPI are also gaining traction. FastAPI is a modern, fast web framework that aims to optimize the developer experience using an integrated development environment and leveraging common standards. It offers features like high performance, faster coding speed, fewer bugs, intuitive editor support, easy use, robust production-ready code, and standards-based APIs. In contrast, generic Python web servers were designed for microservices but not specifically for ML models. Specialized ML serving libraries have emerged to optimize throughput without sacrificing latency, featuring techniques like model compilation, microbatching, bin packing, and "scale to zero" autoscaling. Ray Serve provides a performant Python web server and a specialized ML serving library, allowing developers to plug in a web server such as FastAPI easily and take advantage of its features combined with Ray Serve's own features, including the ability to compose multi-model inference pipelines, scale independently on different hardware, and configure the number of replicas.