Why does one NGINX worker take all the load?
The text discusses the different ways of designing a TCP server with regard to performance, focusing on three models: (a) Single listen socket, single worker process; (b) Single listen socket, multiple worker processes; and (c) Multiple worker processes, each with separate listen socket. It explains that while increasing the number of worker processes can overcome a single CPU core bottleneck, it also opens up new problems. The text then delves into the issue of spreading accept() load across multiple processes and how Linux handles this differently in both cases. Finally, it discusses how SO_REUSEPORT can be used to work around the balancing problem by splitting incoming connections into multiple separate accept queues, resulting in better load distribution. However, it also highlights that while the average is comparable, the maximum value significantly increased and most importantly the deviation is now gigantic, leading to a degraded-latency state. The text concludes by suggesting that changing the standard epoll behavior from LIFO to FIFO could be a better solution.
Company
Cloudflare
Date published
Oct. 23, 2017
Author(s)
Marek Majkowski
Word count
1663
Language
English
Hacker News points
164