Company
Date Published
March 3, 2021
Author
Ryland Goldstein
Word count
2207
Language
English
Hacker News points
None

Summary

Ryland from Temporal, an MIT OSS platform for building highly reliable distributed applications, discussed a recent data-loss bug that affected their users' mission-critical applications. The issue was caused by a Golang variable shadowing problem in the persistence code and only occurred when Cassandra returned specific errors from failed transactions. The team initially assumed the issue was related to memory problems with the clusters running their persistence, but later discovered it was due to the Golang bug. They have since implemented measures to prevent such issues from happening again, including adding tests for dependency-level failures and improving communication channels with users.