Improving Data Loss Prevention accuracy with AI-powered context analysis

Company

Cloudflare

Date Published

March 21, 2025

Author

Warnessa Weaver, Tom Shen, Joshua Johnson

Word count

1323

Language

English

Hacker News points

None

URL

blog.cloudflare.com/improving-data-loss-prevention-accuracy-with-ai-context-analysis

Summary

We've developed a self-improving AI-powered algorithm that adapts to an organization's unique traffic patterns to reduce false positives in Cloudflare's Data Loss Prevention (DLP) solution. This algorithm, built into the DLP Engine, uses a pretrained language model to convert text into high-dimensional vectors, capturing the meaning of the text and ensuring that similar sentences with different wording map to close vectors. The system then performs a nearest neighbor search to find previously logged false or true positives with similar meanings, allowing it to identify context similarities even if the exact wording differs. This approach has proven robust in handling new pattern matches and reducing false positives over time. The solution is seamlessly integrated with Cloudflare's developer platform, including Workers AI and Vectorize, simplifying its design and focusing on the algorithm itself without the overhead of provisioning underlying resources.