/plushcap/analysis/cloudflare/the-problem-with-event-loops

The problem with thread^W event loops

What's this blog post about?

The article discusses the event loop architecture used by NGINX and its advantages and limitations for handling HTTP requests. It explains that an event loop works well when every piece of work finishes quickly but can be problematic if a task takes too long, causing other requests to be blocked. The author highlights issues with Cloudflare's workload, particularly the Web Application Firewall (WAF), which requires more CPU and could slow down other unrelated requests. The article explores various solutions to mitigate this issue, such as increasing worker processes, creating a separate service for WAF, or offloading CPU-intensive tasks to a thread pool. The author explains that NGINX already uses thread pools for filesystem operations and repurposed this system to offload the WAF processing. The article presents performance metrics before and after offloading the WAF into thread pools, showing significant improvements in Time To First Byte (TTFB) and accept latency. The author concludes that event-loop-based processing is advantageous in many situations but can be vulnerable when requiring a lot of CPU time. In such cases, using a separate service or offloading tasks to a thread pool may be the best tradeoff.

Company
Cloudflare

Date published
March 18, 2020

Author(s)
Julien Desgats

Word count
1887

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.