/plushcap/analysis/datadog/monitoring-101-investigation

Monitoring 101: Investigating performance issues

What's this blog post about?

This article discusses an effective approach for diagnosing the root cause of problems in infrastructure using monitoring data. It highlights three types of monitoring data - work metrics, resource metrics, and events - that can help identify issues. The process involves starting with top-level work metrics to characterize the problem, then investigating resources used by the system, checking for any changes or events that may be correlated with the issue, and finally fixing it and adding more instrumentation if necessary. Building dashboards in advance is recommended to speed up investigation during an outage. The article emphasizes the importance of a systematic approach to problem diagnosis using monitoring data.

Company
Datadog

Date published
July 16, 2015

Author(s)
Alexis Lê-Quôc

Word count
1050

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.