/plushcap/analysis/datadog/elasticsearch-game-day

3 lessons learned from an Elasticsearch game day

What's this blog post about?

Datadog recently ran a game day on one of their Elasticsearch clusters to test the resilience of their systems. They stopped Elasticsearch on various nodes including leader node, client nodes for recent and long-term data, and observed how their applications responded. The lessons learned include being prepared for 503s during leader election, handling dangling indices, and implementing health checks for client nodes. Game day exercises are a great way to test systems' fault tolerance and improve alerts and fixes.

Company
Datadog

Date published
Dec. 16, 2020

Author(s)
Aaditya Talwai, Emily Chang

Word count
1542

Language
English

Hacker News points
1


By Matt Makai. 2021-2024.