connect() - why are you so slow?
Cloudflare has developed three solutions to solve the problem of port selection performance bottlenecks in TCP connections: 1. The "select, test, repeat" solution involves creating a socket and trying to connect repeatedly with different source IP addresses until a free port is found. This method can be time-consuming. 2. The second solution is called "select port by random shifting range". It generates a random offset within the ephemeral port range and then tries to bind to that shifted range. If it fails, it shifts the range again randomly until a free port is found. 3. The third solution involves using a new patch introduced in kernel versions 6.8 and later. This solution eliminates the need for window shifting and instead uses a similar approach to "select port by random shifting range" such that the start offset is randomized to be even or odd, but then loops incrementally rather than skipping every other port. The user space implementation of these solutions results in better performance compared to TCP's default behavior. The kernel solution performs slightly faster due to algorithm improvements and the ability to always find a port given the full search space of the range. These solutions can help improve the connect() latency for workloads with high numbers of unicast egress connections. In addition, other protocols such as UDP and DCCP also benefit from these port selection strategies, although they may have some differences in how ports are selected and managed. It is recommended to explore and measure your own systems to determine which strategy works best for your specific needs.
Company
Cloudflare
Date published
Feb. 8, 2024
Author(s)
Frederick Lawler
Word count
2902
Language
English
Hacker News points
5