/plushcap/analysis/datastax/datastax-how-we-tracked-down-linux-kernel-bug-fallout

How We Tracked Down a Linux Kernel Bug with Fallout

What's this blog post about?

The article discusses a complex bug that took weeks to debug, involving multiple layers of software stack. It started with a performance test timing out after 16 hours due to an issue in the Linux kernel hrtimer code. Fallout, an open-source distributed systems testing service, was instrumental in quickly iterating and gathering new data to validate and invalidate guesses about the underlying bug. The author used various tools like nodetool tpstats, jstack, and BPF script to understand the issue at different levels of the stack. The kernel bug causing the red-black tree to become inconsistent was already fixed upstream in Linux 5.12 but not yet pulled into Ubuntu's kernel. The author suggests having a bag of tools and techniques to understand the behavior of an app at various levels of the stack, and services like Fallout for automatic deployment and provisioning of virtual machines for running tests.

Company
DataStax

Date published
Sept. 27, 2021

Author(s)
Matt Fleming

Word count
2754

Hacker News points
1

Language
English


By Matt Makai. 2021-2024.