Training a million models per day to save customers of all sizes from DDoS attacks

Company

Cloudflare

Date Published

Oct. 23, 2024

Author

Nick Wood, Manish Arora

Word count

2159

Language

English

Hacker News points

None

URL

blog.cloudflare.com/training-a-million-models-per-day-to-save-customers-of-all-sizes-from-ddos

Summary

The text discusses the challenges of detecting Distributed Denial of Service (DDoS) attacks and presents an anomaly detection pipeline developed by Cloudflare to identify unmitigated or partially mitigated DDoS attacks. The initial approach, based on a naive volumetric model, is shown to be ineffective due to its reliance on stable traffic volume over time, which rarely holds true in practice. Time series forecasting methods are also considered but deemed impractical for various reasons. The solution proposed by Cloudflare involves using multiple dimensions to measure traffic and identifying correlations between these variables. Through careful analysis, a dozen such variables were discovered that follow a normal distribution, aren't correlated with volume, and deviate from the underlying normal distribution during "under attack" events. Principal Component Analysis (PCA) is used to convert these multidimensional data into a spherical shape, allowing for an anomaly score based on distance from the center of the cloud. The process is highly parallelizable and can be scaled horizontally as needed. Cloudflare currently re-trains models every day but may reduce this frequency in the future due to minimal intraday model drift. The company trains models for a large sample of representative customers, including those on the Free plan, to identify attacks for further study and tuning of existing DDoS systems for all customers.