/plushcap/analysis/cloudflare/high-availability-load-balancers-with-maglev

High Availability Load Balancers with Maglev

What's this blog post about?

The text discusses the implementation of a new load balancing service for backend services at Terin Stock, which previously relied on stateful TCP proxies and NATs. The goals for this replacement were to preserve source IPs, support an architecture with backends across racks and subnets, allow zero-downtime maintenance, use common Linux tools, avoid explicit connection synchronization between load balancers, and enable a staged rollout from the previous implementation. The new architecture uses consistent hashing to ensure all load balancers send traffic to the same backends without needing to persist any connection state. It leverages Bidirectional Forwarding Detection (BFD) protocol between routers and load balancers, IP Virtual Server (IPVS), and Foo-Over-UDP encapsulation for routing traffic from the load balancer. The team also developed a Go agent running on each load balancer to synchronize with a control plane layer tracking service locations and backend server availability. The text concludes by mentioning future work, including monitoring IPVS developments, migrating to nftables, improving handling of BGP session failures, investigating Lightweight Tunnels, and adding support for additional reading materials on similar topics.

Company
Cloudflare

Date published
June 10, 2020

Author(s)
Terin Stock

Word count
2097

Hacker News points
18

Language
English


By Matt Makai. 2021-2024.