Debugging TCP socket leak in a Kubernetes cluster
The author experienced network connectivity issues in their Kubernetes cluster running on Google Kubernetes Engine (GKE). They noticed delayed API responses and connection refused errors, particularly when the response body size was larger. After investigating, they found that one particular node was running out of TCP stack memory. This issue led to a discussion about kubelet's responsibility for monitoring health of a node, including CPU/RAM/disk usage but not network health. The author filed an issue with Kubernetes to consider monitoring tcp_mem statistics as well.
Company
Hasura
Date published
April 17, 2018
Author(s)
Shahidh K Muhammed
Word count
996
Hacker News points
None found.
Language
English