Company
Date Published
Aug. 13, 2024
Author
Richard Artoul
Word count
2458
Language
English
Hacker News points
12

Summary

There are two ways to learn about distributed systems: by studying the literature and by building and operating them in production. The author, who has spent 10 years operating and building large-scale distributed databases, has learned practical knowledge about what it takes to convert a design into an implementation that works at scale. However, many topics related to distributed systems are not well-covered in the literature, such as backpressure, which is a critical detail that every good distributed system needs to get right to survive in production. The author discusses how they implemented a backpressuring system for their WarpStream database, which uses metrics such as memory usage and inflight requests to trigger backpressure, and how it provides better performance and scalability than traditional rate-limited approaches. The system is designed to make the distributed system feel "springy" so that it can immediately recover when additional resources are provided or load is removed.