Monitoring 101: Investigating performance issues
This article discusses an effective approach for diagnosing the root cause of problems in infrastructure using monitoring data. It highlights three types of monitoring data - work metrics, resource metrics, and events - that can help identify issues. The process involves starting with top-level work metrics to characterize the problem, then investigating resources used by the system, checking for any changes or events that may be correlated with the issue, and finally fixing it and adding more instrumentation if necessary. Building dashboards in advance is recommended to speed up investigation during an outage. The article emphasizes the importance of a systematic approach to problem diagnosis using monitoring data.
Company
Datadog
Date published
July 16, 2015
Author(s)
Alexis Lê-Quôc
Word count
1050
Hacker News points
None found.
Language
English