Dealing with rejection (in distributed systems)
There are two ways to learn about distributed systems: by studying the literature and by building and operating them in production. The author, who has spent 10 years operating and building large-scale distributed databases, has learned practical knowledge about what it takes to convert a design into an implementation that works at scale. However, many topics related to distributed systems are not well-covered in the literature, such as backpressure, which is a critical detail that every good distributed system needs to get right to survive in production. The author discusses how they implemented a backpressuring system for their WarpStream database, which uses metrics such as memory usage and inflight requests to trigger backpressure, and how it provides better performance and scalability than traditional rate-limited approaches. The system is designed to make the distributed system feel "springy" so that it can immediately recover when additional resources are provided or load is removed.
Company
WarpStream
Date published
Aug. 13, 2024
Author(s)
Richard Artoul
Word count
2458
Language
English
Hacker News points
12