Company
Date Published
Dec. 18, 2020
Author
Marek Majkowski
Word count
2437
Language
English
Hacker News points
6

Summary

The text discusses the optimization journey of an image processing pipeline at Cloudflare. It delves into various algorithms and techniques used to improve the speed of the matching algorithm in their CSAM image scanning tool. Initially, a naive quadratic algorithm was employed, but it proved to be slow. SIMD instructions were then utilized for improvement, but the performance still fell short of expectations. The Vantage Point Tree algorithm was attempted, but due to the large dimensionality of the problem, it failed to narrow down the search effectively. The authors discovered that they didn't need to fetch all data from memory and could compute only a subset of dimensions for each hash. This led to the development of a smarter brute force algorithm, which was further optimized by transposing data for better AVX usage. The final algorithm achieved significant speed improvements, reducing query times from 55ms to just 0.73ms. The text also highlights the challenges faced in making technologies work at Cloudflare scale and emphasizes the need for continuous optimization efforts.