Introducing recovery thresholds for metric alerts
The text discusses the problem of unstable metrics causing frequent alerts and introduces recovery thresholds in Datadog to address this issue. Recovery thresholds work on the principle of hysteresis, similar to how a thermostat regulates temperature. They set different values for when an alert triggers and when it resolves, ensuring that an alert only enters the "recovered" state once a metric has passed the recovery threshold. This helps reduce noise and distraction from flapping monitors and increases confidence in issue resolution. Recovery thresholds can be set up during monitor creation via the UI or API, with separate thresholds for recovery from alert and warning states.
Company
Datadog
Date published
Nov. 6, 2017
Author(s)
Rachel Windzberg
Word count
364
Hacker News points
None found.
Language
English