Dealing with rejection (in distributed systems)

Company

WarpStream

Date Published

Aug. 13, 2024

Author

Richard Artoul

Word count

2458

Language

English

Hacker News points

URL

www.warpstream.com/blog/dealing-with-rejection-in-distributed-systems

Summary

There are two ways to learn about distributed systems: by studying the literature and by building and operating them in production. The author, who has spent 10 years operating and building large-scale distributed databases, has learned practical knowledge about what it takes to convert a design into an implementation that works at scale. However, many topics related to distributed systems are not well-covered in the literature, such as backpressure, which is a critical detail that every good distributed system needs to get right to survive in production. The author discusses how they implemented a backpressuring system for their WarpStream database, which uses metrics such as memory usage and inflight requests to trigger backpressure, and how it provides better performance and scalability than traditional rate-limited approaches. The system is designed to make the distributed system feel "springy" so that it can immediately recover when additional resources are provided or load is removed.