ML Ops Platform at Cloudflare
Cloudflare, an internet security company, has detailed their Machine Learning Operations (MLops) approach that enables them to secure applications and APIs built with AI. They have shared their strategy for creating robust ML models, which includes steps such as data collection, model training, validation, deployment, and monitoring. Their framework is designed to provide a consistent pipeline from data to model, and then model to inference. The company has curated an array of model templates that serve as production-ready data science repositories with example models. These templates are deployed through production to ensure they remain stable foundations for future projects. To start a new project, all it takes is one Makefile command to build a new CICD project in the user's chosen git project. For orchestration, Cloudflare uses Directed Acyclic Graphs (DAGs), which are robust flow chart orchestration paradigms that weave together steps from data to model and then model to inference. They have experimented with different approaches such as Apache Airflow, Argo Workflows, Kubeflow Pipelines, and Temporal. In terms of hardware, the company leverages GPUs for core datacenter workloads and edge inference, and uses observability and metrics consumed by Prometheus to track orchestration performance, maximize hardware utilization, and operate within a Kubernetes-native experience. Adoption is an important aspect of MLops, and Cloudflare has found success when they can help get projects started and shape the pipelines for success. They have shared their components for shared use such as notebooks, orchestration, data versioning (DVC), feature engineering (Feast), and model versioning (MLflow) to enable collaboration across teams. Overall, Cloudflare's MLops approach is designed to help secure applications and APIs built with AI by leveraging the power of their network and providing a consistent pipeline from data to model and then model to inference.
Company
Cloudflare
Date published
Dec. 7, 2023
Author(s)
Keith Adler, Rio Harapan Pangihutan
Word count
1833
Language
English
Hacker News points
None found.