Repairing Failed Pods

Jonathan Peña
Repairing Failed Pods

In this hands-on exercise, we will troubleshoot a Kubernetes deployment issue within the web namespace. Starting with an overview of all resources, we identify that a pod is stuck in an ImagePullBackOff state—a common issue when the specified container image is unavailable or incorrectly tagged.This exercise not only covers essential Kubernetes commands but also reinforces key concepts like diagnosing pod issues, editing deployments, and validating changes in a live cluster.

  1. We will list out the namespaces to get a glimpse of our infrastructure
    kubectl get all --all-namespaces.
Important to notice we have a web namespace.
  1. We will use kubectl get svc,po,deploy -n web
  2. This will help us get information about services, pods, and deployments within the web namespace.
We can see here the status is in ImagePullBackOff.
  1. So in this case, we will perform an investigation on any pod listed here, let's do it on the first one. We will need to execute the following command:
    kubectl describe pod nginx-856876659f-f9cqq -n web
As we can see here, there's a clear issue with nginx:191

We need to edit the pod image according to the error messages. To edit the deployment, execute the following command:
kubectl edit deploy nginx -web

In this case, we removed the :191 and left it only at nginx
  1. Redeploy by executing kubectl edit deploy nginx -n web. Delete the :191, hit escape, and :wq! to exit out of the editor.
  2. Now, let's verify that these changes have gone into effect kubectl get rs -n web. We should get a list of all the pods that belong to the new replica set.
The new replica set has an age of 5 Minutes and 43 Seconds, the old replica set has an age of 24 Minutes.
  1. Let's list our pods to get the IP addresses kubectl get po -n web -o wide.

Now let's spin up a busy box to test one of these pods' health.
kubectl run busybox --image=busybox --rm -it --restart=Never – sh

  1. We will then call the pod wget -qO- 10.244.2.12:80 which belongs to our first one.
  1. With this, we can conclude we have fixed the broken pods and can connect to the nginx service successfully.


Great! Next, complete checkout for full access to Cybersecurity
Welcome back! You've successfully signed in
You've successfully subscribed to Cybersecurity
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated