/plushcap/analysis/hasura/debugging-tcp-socket-leak-in-a-kubernetes-cluster-99171d3e654b

Debugging TCP socket leak in a Kubernetes cluster

What's this blog post about?

The author experienced network connectivity issues in their Kubernetes cluster running on Google Kubernetes Engine (GKE). They noticed delayed API responses and connection refused errors, particularly when the response body size was larger. After investigating, they found that one particular node was running out of TCP stack memory. This issue led to a discussion about kubelet's responsibility for monitoring health of a node, including CPU/RAM/disk usage but not network health. The author filed an issue with Kubernetes to consider monitoring tcp_mem statistics as well.

Company
Hasura

Date published
April 17, 2018

Author(s)
Shahidh K Muhammed

Word count
996

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.