Debugging war story: the mystery of NXDOMAIN
The blog post describes a debugging adventure on Cloudflare's Mesos-based cluster, which is primarily used to process log file information and detect attacks. Engineers encountered an issue where internal DNS queries were returning "no such host" errors for existing domains. Through extensive testing and analysis, it was discovered that the problem stemmed from packet loss during DNS resolution attempts. The solution involved increasing the retries option in the resolv.conf file to better handle transient network issues and improve the reliability of DNS resolution.
Company
Cloudflare
Date published
Dec. 7, 2016
Author(s)
Ivan Babrou
Word count
1621
Hacker News points
None found.
Language
English