Company
Date Published
Author
Tengwei Cai, Yang Liu, Chengxi Luo, Xiaofeng Yang
Word count
2353
Language
English
Hacker News points
3

Summary

This blog post from Ant Group discusses their implementation of scalable Ray Serving architecture atop Ray, deploying 240,000 cores for model serving, scaling by 3.5x from the previous year, and reaching 1.37 million TPS during peak times. The authors introduce Ant Ray Serving as an online service platform that can deploy users' Java and Python code as distributed online services, providing scaling, traffic routing, and monitoring capabilities. They highlight two business scenarios: Model Serving and EventBus, which is an event-driven serverless platform based on Ant Ray Serving. The authors discuss the challenges they faced with model serving, including performance and traffic isolation issues, and how they addressed them using Ray Serving's service isolation and resource isolation features. They also introduce automatic scaling capabilities for both service instances and Ray clusters. Additionally, they mention their cooperation with the Anyscale Ray Serve team to integrate Ant Ray Serving into the open source Ray Serve architecture, providing support for Java language in Ray Serve, cross-language deployment, and componentization capabilities with flexible scalability.