/plushcap/analysis/datadog/set-and-monitor-slas

Monitoring services and setting SLAs with Datadog

What's this blog post about?

Service Level Agreements (SLAs) are crucial for improving the performance and reliability of services, benefiting both service providers and users. SLAs involve defining clear objectives using Service Level Objectives (SLOs) and Service Level Indicators (SLIs). To set reasonable SLAs and SLOs, it is essential to collect data on key metrics such as latency, throughput, and error rates from user-facing applications and subcomponents. Datadog Synthetics simulates user conditions and helps establish performance expectations, while Datadog APM tracks real user interactions and identifies underperforming services or code-level inefficiencies. By analyzing the entire distribution of metrics rather than just averages, more accurate insights can be gained. Datadog APM generates dashboards with SLI metrics for each internal service, allowing teams to define objectives that make sense for their specific stack components. Customizable, comprehensive dashboards enable monitoring and assessing the health of services and their underlying infrastructure-level components. SLO-driven alerts can be set up in real time to trigger at increasing levels of severity as metrics approach internal and external SLA thresholds. Integrating APM with infrastructure monitoring allows for tracing requests across various services, making it easier to investigate issues and identify potential bottlenecks throughout the entire stack. To implement an effective SLA strategy, organizations should first establish a monitoring platform, followed by setting up dashboards and alerts that reflect their SLAs and key resources or services.

Company
Datadog

Date published
April 8, 2019

Author(s)
Emily Chang

Word count
2127

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.