Lessons Learned from Scaling Up Cloudflare’s Anomaly Detection Platform

Company

Cloudflare

Date Published

March 12, 2021

Author

Jeffrey Tang

Word count

2075

Language

English

Hacker News points

URL

blog.cloudflare.com/lessons-learned-from-scaling-up-cloudflare-anomaly-detection-platform

Summary

Cloudflare's Anomaly Detection for bot management utilizes a "defense in depth" model that combines multiple detection systems to create a robust platform. One of these systems is Anomaly Detection, which identifies bots by modeling the characteristics of legitimate user traffic as a healthy baseline and targeting anomalous traffic. The algorithm used for this purpose is Histogram-Based Outlier Scoring (HBOS), which detects global outliers quickly in linear time. Anomaly Detection processes over 500K requests per second, with more than 200K CAPTCHAs issued per minute. It identifies suspected bots from over 140 different countries and 2,200 different ASNs using automatically generated baselines and visitor models unique to each enrolled site. The Anomaly Detection platform consists of a series of microservices running on Kubernetes, with request data coming in through a dedicated Kafka topic and being inserted into ClickHouse and Redis for analysis. The Detector service calculates outlier scores for visitors compared to the baselines, while the Publisher service sends detections to the edge for use in bot score calculations. The platform has grown significantly since its launch, with improvements made to Redis optimization, microservices architecture, and overall scalability. Future developments include expanding into a problem space with huge cardinality, delivering better detection accuracy on sites with multiple traffic types, and enhancing support and education for the team behind Anomaly Detection and Bot Management.