The problem with thread^W event loops
The article discusses the event loop architecture used by NGINX and its advantages and limitations for handling HTTP requests. It explains that an event loop works well when every piece of work finishes quickly but can be problematic if a task takes too long, causing other requests to be blocked. The author highlights issues with Cloudflare's workload, particularly the Web Application Firewall (WAF), which requires more CPU and could slow down other unrelated requests. The article explores various solutions to mitigate this issue, such as increasing worker processes, creating a separate service for WAF, or offloading CPU-intensive tasks to a thread pool. The author explains that NGINX already uses thread pools for filesystem operations and repurposed this system to offload the WAF processing. The article presents performance metrics before and after offloading the WAF into thread pools, showing significant improvements in Time To First Byte (TTFB) and accept latency. The author concludes that event-loop-based processing is advantageous in many situations but can be vulnerable when requiring a lot of CPU time. In such cases, using a separate service or offloading tasks to a thread pool may be the best tradeoff.
Company
Cloudflare
Date published
March 18, 2020
Author(s)
Julien Desgats
Word count
1887
Hacker News points
None found.
Language
English