How to debug Kubernetes Pending pods and scheduling failures
Pending pods in Kubernetes are normal and indicate that the scheduler is working on assigning them to nodes. However, if a pod remains in this state for an unusually long period of time, it may be due to issues with node availability or scheduling predicates. In this post, we covered several reasons why the Kubernetes scheduler might encounter difficulties with placing Pending pods, including: 1. Taints and tolerations: If a node has taints that are not tolerated by any of its assigned pods, it will remain unschedulable until those taints are removed or tolerated by new pods. 2. Node selectors: If a pod’s node selector does not match the labels on any available nodes, it will remain Pending. 3. Resource requests and limits: If there are no nodes in the cluster that can satisfy a pod’s resource requests, it will remain Pending until resources become available or the pod is modified to request fewer resources. 4. PersistentVolumeClaims: If a pod’s PersistentVolumeClaim cannot be bound to a compatible PersistentVolume due to scheduling conflicts (e.g., because the associated node and volume are located in different zones), it will remain Pending. 5. Local volumes: If a pod’s PersistentVolumeClaim is still bound to local storage on an unavailable node, it will remain Pending until that node becomes available or the pod is deleted and recreated with a new PersistentVolumeClaim. 6. Inter-pod affinity and anti-affinity rules: These rules define where certain pods can (and cannot) get scheduled based on other pods that are already running on nodes. If these rules prevent a pod from getting scheduled, it will remain Pending until the scheduling conflict is resolved. 7. Rolling update deployment settings: During a rolling update, Kubernetes attempts to progressively update pods to reduce the likelihood of degrading the availability of the workload. However, if the number of desired pods is less than four and the default values for maxUnavailable and maxSurge are in effect, or if the maxUnavailable threshold is reached, a rolling update may need to pause as it hits that threshold. To troubleshoot Kubernetes Pending pods, you can use kubectl describe to get more details about the scheduling predicates for the affected pods and events from your clusters. You can also set up automated alerts to detect issues in your workloads and auto-detect stories generated by Datadog Watchdog to help identify abnormalities in your clusters.
Company
Datadog
Date published
May 20, 2021
Author(s)
Emily Chang
Word count
5278
Language
English
Hacker News points
1