Company
Date Published
Nov. 19, 2015
Author
Marek Majkowski
Word count
1462
Language
English
Hacker News points
10

Summary

A customer reported slow HTTP responses from CloudFlare CDN servers. The issue was not easily reproducible and went unnoticed by usual monitoring systems. After investigating the problem, it was discovered that there were spikes in latency between the router and the server within their datacenter. System Tap, a debugging tool for Linux, helped identify the function causing the latency spike as tcp_collapse. The issue was resolved by adjusting the rmem sysctl to limit the receive buffer size on TCP sockets, which in turn reduced the time required for garbage collection and improved performance.