Company
Date Published
Author
Colin McGrath, Matt Howard, Philip Kiely
Word count
870
Language
English
Hacker News points
None

Summary

The concept of control plane and workload plane in model serving infrastructure is a powerful abstraction that enables building worldwide multi-cloud AI model serving infrastructure. The control plane, a single Kubernetes cluster, serves as the backend for user interface and model management API endpoints, while also handling tasks such as building model serving images and balancing load across workload planes. In contrast, the workload planes are collections of GPU resources for running model inference, which can be set up in arbitrary cloud environments and regions. The separation of concerns between control and workload planes is motivated by customer and operational challenges, including regional preference, GPU availability, self-hosted model inference, scaling with customer demand, and reducing compute and maintenance overhead. Each workload plane has its unique capabilities and limitations, affecting the overall system's performance and security. By separating data from control or workers from a centralized decision maker, the control plane is able to orchestrate workload planes across regions, cloud providers, and cloud accounts, while each workload plane adjusts to the specific environment it's running in.