/plushcap/analysis/datadog/end-end-reliability-testing-pagerduty-datadog

End-to-end reliability testing with PagerDuty & Datadog

What's this blog post about?

Ashwin Jiwane, Software Engineer at PagerDuty, explains how the company uses data to improve its notification pipeline reliability and ensure timely delivery of alerts to customers. By combining Datadog and PagerDuty, they have created an End-to-End Third-Party Provider testing practice that proactively identifies outages in their third-party providers' systems and quickly finds a replacement to minimize or avoid customer impact. This involves setting up three phones with different mobile carrier networks and using an internally-built mobile app to send SMS alerts. The time taken for each SMS to reach the designated phone is measured, and if it exceeds acceptable thresholds, a PagerDuty alert is sent to the on-call engineer who switches priority levels of providers accordingly. This approach helps limit the impact of failures on customers and ensures that third-party vendors are consistently tested for reliability.

Company
Datadog

Date published
July 30, 2014

Author(s)
Ashwin Jiwane

Word count
542

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.