Tracing System CPU on Debian Stretch
The issue was caused by a regression in the Linux kernel between versions 4.9 and 4.10. This resulted in increased CPU usage on servers running Kafka, causing performance degradation. The problem was identified through bisection, which helped to pinpoint the exact version where the issue first appeared. The solution involved enabling TCP segmentation offload (TSO) and other network offloading features on VLAN interfaces in the kernel configuration. This significantly improved performance by reducing CPU usage. In addition, a workaround was implemented to automatically enable these offloading features if they are disabled on boot for VLAN interfaces. A ticket was also filed upstream with systemd regarding this issue.
Company
Cloudflare
Date published
May 13, 2018
Author(s)
Ivan Babrou
Word count
2974
Hacker News points
None found.
Language
English