End-to-end reliability testing with PagerDuty & Datadog
Ashwin Jiwane, Software Engineer at PagerDuty, explains how the company uses data to improve its notification pipeline reliability and ensure timely delivery of alerts to customers. By combining Datadog and PagerDuty, they have created an End-to-End Third-Party Provider testing practice that proactively identifies outages in their third-party providers' systems and quickly finds a replacement to minimize or avoid customer impact. This involves setting up three phones with different mobile carrier networks and using an internally-built mobile app to send SMS alerts. The time taken for each SMS to reach the designated phone is measured, and if it exceeds acceptable thresholds, a PagerDuty alert is sent to the on-call engineer who switches priority levels of providers accordingly. This approach helps limit the impact of failures on customers and ensures that third-party vendors are consistently tested for reliability.
Company
Datadog
Date published
July 30, 2014
Author(s)
Ashwin Jiwane
Word count
542
Language
English
Hacker News points
None found.