Control plane vs workload plane in model serving infrastructure

Company

Baseten

Date Published

May 29, 2024

Author

Colin McGrath, Matt Howard, Philip Kiely

Word count

870

Language

English

Hacker News points

None

URL

www.baseten.co/blog/control-plane-vs-workload-plane-in-model-serving-infrastructure

Summary

The concept of control plane and workload plane in model serving infrastructure is a powerful abstraction that enables building worldwide multi-cloud AI model serving infrastructure. The control plane, a single Kubernetes cluster, serves as the backend for user interface and model management API endpoints, while also handling tasks such as building model serving images and balancing load across workload planes. In contrast, the workload planes are collections of GPU resources for running model inference, which can be set up in arbitrary cloud environments and regions. The separation of concerns between control and workload planes is motivated by customer and operational challenges, including regional preference, GPU availability, self-hosted model inference, scaling with customer demand, and reducing compute and maintenance overhead. Each workload plane has its unique capabilities and limitations, affecting the overall system's performance and security. By separating data from control or workers from a centralized decision maker, the control plane is able to orchestrate workload planes across regions, cloud providers, and cloud accounts, while each workload plane adjusts to the specific environment it's running in.