The story of one latency spike
A customer reported slow HTTP responses from CloudFlare CDN servers. The issue was not easily reproducible and went unnoticed by usual monitoring systems. After investigating the problem, it was discovered that there were spikes in latency between the router and the server within their datacenter. System Tap, a debugging tool for Linux, helped identify the function causing the latency spike as tcp_collapse. The issue was resolved by adjusting the rmem sysctl to limit the receive buffer size on TCP sockets, which in turn reduced the time required for garbage collection and improved performance.
Company
Cloudflare
Date published
Nov. 19, 2015
Author(s)
Marek Majkowski
Word count
1462
Hacker News points
None found.
Language
English