The revenge of the listening sockets
In November 2015, a blog post discussed a latency spike due to misconfigured rmem settings. However, the issue persisted even after adjusting the sysctl. The problem was traced back to high latency in soft interrupt handling code, which is responsible for processing ICMP pings. System Tap scripts were used to measure the time distribution of the main soft IRQ function net_rx_action and found that while most calls were handled quickly, some took up to 32ms. Further investigation revealed that the latency was caused by a slow path in the __inet_lookup_listener function, which is responsible for finding an appropriate connection sock struct structure for a packet. The issue was resolved by deploying two changes: binding TCP connections of CloudFlare's DNS server to ANY_IP address (0.0.0.0:53) and increasing the LHTABLE size in their kernels.
Company
Cloudflare
Date published
April 5, 2016
Author(s)
Marek Majkowski
Word count
1340
Language
English
Hacker News points
13