/plushcap/analysis/datadog/watchdog

Watchdog: Auto-detect performance anomalies without setting alerts

What's this blog post about?

Watchdog is an auto-detection engine by Datadog that surfaces performance problems in applications without any manual setup or configuration. It uses machine learning algorithms to automatically detect issues such as latency spikes, elevated error rates, and network issues across microservices, endpoints, and cloud zones. The tool creates a feed of "stories" with notable findings, each highlighting the affected resource, timeframe, and impact duration. Each story links to a detail page that provides further context and aggregates performance statistics for the specific service or resource. Watchdog also identifies likely culprits by surfacing related behavior and common stack traces. It detects network issues in cloud infrastructure and helps identify the underlying cause of anomalies. The tool builds on Datadog's existing machine learning features to provide high-quality results tailored for large-scale infrastructure and applications, with plans to add more algorithms for root cause analysis and Kubernetes anomaly detection.

Company
Datadog

Date published
July 12, 2018

Author(s)
Brad Menezes

Word count
512

Hacker News points
3

Language
English


By Matt Makai. 2021-2024.