Company
Date Published
April 17, 2018
Author
Shahidh K Muhammed
Word count
996
Language
English
Hacker News points
None

Summary

The author experienced network connectivity issues in their Kubernetes cluster running on Google Kubernetes Engine (GKE). They noticed delayed API responses and connection refused errors, particularly when the response body size was larger. After investigating, they found that one particular node was running out of TCP stack memory. This issue led to a discussion about kubelet's responsibility for monitoring health of a node, including CPU/RAM/disk usage but not network health. The author filed an issue with Kubernetes to consider monitoring tcp_mem statistics as well.