Datadog recently ran a game day on one of their Elasticsearch clusters to test the resilience of their systems. They stopped Elasticsearch on various nodes including leader node, client nodes for recent and long-term data, and observed how their applications responded. The lessons learned include being prepared for 503s during leader election, handling dangling indices, and implementing health checks for client nodes. Game day exercises are a great way to test systems' fault tolerance and improve alerts and fixes.