How can I debug "ImagePullBackOff"?
Asked Answered
P

14

227

All of a sudden, I cannot deploy some images which could be deployed before. I got the following pod status:

[root@webdev2 origin]# oc get pods 
NAME                      READY     STATUS             RESTARTS   AGE 
arix-3-yjq9w              0/1       ImagePullBackOff   0          10m 
docker-registry-2-vqstm   1/1       Running            0          2d 
router-1-kvjxq            1/1       Running            0          2d 

The application just won't start. The pod is not trying to run the container. From the Event page, I have got Back-off pulling image "172.30.84.25:5000/default/arix@sha256:d326. I have verified that I can pull the image with the tag with docker pull.

I have also checked the log of the last container. It was closed for some reason. I think the pod should at least try to restart it.

I have run out of ideas to debug the issues. What can I check more?

Palestrina answered 18/1, 2016 at 6:24 Comment(6)
Is this a multi machine setup? If so verify you can pull from all nodes. If not, turn up logging to --loglevel=5 on the node and restart - you should see information printed describing the attempt to pull the image and any errors included.Canadian
What came out after restarting with loglevel=5?Jeffrey
Did you solve the problem? can someone explain this issue of 'ImagePullBackOff'? (images are existing in my 'docker images')Torry
I got this by using the wrong region for my repo. I forgot to add eu. to --image=eu.gcr.io/$PROJECT_ID/...Lezlie
In my case it was the wrong tag name for the image being passed. I changed the TAG name which solved the issue.Akel
It's a good read solved my problem totally tutorialworks.com/kubernetes-imagepullbackoffCasserole
B
232

You can use the 'describe pod' syntax

For OpenShift use:

oc describe pod <pod-id>  

For vanilla Kubernetes:

kubectl describe pod <pod-id>  

Examine the events of the output. In my case it shows Back-off pulling image unreachableserver/nginx:1.14.22222

In this case the image unreachableserver/nginx:1.14.22222 can not be pulled from the Internet because there is no Docker registry unreachableserver and the image nginx:1.14.22222 does not exist.

NB: If you do not see any events of interest and the pod has been in the 'ImagePullBackOff' status for a while (seems like more than 60 minutes), you need to delete the pod and look at the events from the new pod.

For OpenShift use:

oc delete pod <pod-id>
oc get pods
oc get pod <new-pod-id>

For vanilla Kubernetes:

kubectl delete pod <pod-id>  
kubectl get pods
kubectl get pod <new-pod-id>

Sample output:

  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  32s                default-scheduler  Successfully assigned rk/nginx-deployment-6c879b5f64-2xrmt to aks-agentpool-x
  Normal   Pulling    17s (x2 over 30s)  kubelet            Pulling image "unreachableserver/nginx:1.14.22222"
  Warning  Failed     16s (x2 over 29s)  kubelet            Failed to pull image "unreachableserver/nginx:1.14.22222": rpc error: code = Unknown desc = Error response from daemon: pull access denied for unreachableserver/nginx, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
  Warning  Failed     16s (x2 over 29s)  kubelet            Error: ErrImagePull
  Normal   BackOff    5s (x2 over 28s)   kubelet            Back-off pulling image "unreachableserver/nginx:1.14.22222"
  Warning  Failed     5s (x2 over 28s)   kubelet            Error: ImagePullBackOff

Additional debugging steps

  1. try to pull the docker image and tag manually on your computer
  2. Identify the node by doing a 'kubectl/oc get pods -o wide'
  3. ssh into the node (if you can) that can not pull the docker image
  4. check that the node can resolve the DNS of the docker registry by performing a ping.
  5. try to pull the docker image manually on the node
  6. If you are using a private registry, check that your secret exists and the secret is correct. Your secret should also be in the same namespace. Thanks swenzel
  7. Some registries have firewalls that limit ip address access. The firewall may block the pull
  8. Some CIs create deployments with temporary docker secrets. So the secret expires after a few days (You are asking for production failures...)
Blast answered 24/5, 2017 at 6:40 Comment(5)
Also, in case you use a private image repository, make sure your image pull secrets exist, have no typo and they are in the right namespace.Stretcherbearer
In case of private image repository also make sure that you reference the image pull secrets in your pod using the "imagePullSecrets" entry.Juicy
There is also a lengthy blog post describing how to debug this in depth here: managedkube.com/kubernetes/k8sbot/troubleshooting/…Equate
These instructions are out of date -- kubernetes no longer provides detailed information on imagepullbackoffSeminole
@KirkSefchik, I think I figured out why you do not see the detailed information. I have updated my answer, thanks.Blast
L
2

Try to edit to see what's wrong (I had the wrong image location):

kubectl edit pods arix-3-yjq9w

Or even delete your pod:

kubectl delete arix-3-yjq9w
Lezlie answered 12/10, 2016 at 12:48 Comment(0)
C
2

I faced a similar situation and it turned out that with the actualisation of Docker Desktop I was signed out. After I signed back in, all worked fine again.

Crosspollinate answered 13/1, 2021 at 10:38 Comment(2)
What is "actualisation" (in this context)? Can you elaborate?Gelt
an update to a newer versionCrosspollinate
T
2

I ran into this issue on Google Kubernetes Engine (GKE), and the reason was no credentials for Docker.

Running this resolved it:

gcloud auth configure-docker
Theona answered 7/2, 2021 at 14:53 Comment(0)
B
1

On GKE, if the pod is dead, it's best to check for the events. It will show in more detail what the error is about.

In my case, I had:

Failed to pull image "gcr.io/project/imagename@sha256:c8e91af54fc17faa1c49e2a05def5cbabf8f0a67fc558eb6cbca138061a8400a":
 rpc error: code = Unknown desc = error pulling image configuration: unknown blob

It turned out the image was damaged somehow. After repushing it and deploying with the new hash, it worked again.

In retrospective, I think the images got damaged, because the bucket in GCP that hosts the images had a clean up policy set on it, and that basically removed the images. As a result the message as above can be seen in events.

Other common issues are a wrong name (gcr.io vs eu.gcr.io) and it can also be that the registry cannot be reached somehow. Again, hints are in the events, the message there should tell you enough.

More general information can be found here (like for authentication):

Pushing and pulling images

Beater answered 10/7, 2020 at 9:38 Comment(0)
T
0

I forgot to push the image tagged 1.0.8 to the ECR (AWS images hub)... If you are using Helm and upgrade by:

helm upgrade minta-user ./src/services/user/helm-chart

make sure that the image tag inside file values.yaml is pushed (to ECR or Docker Hub, etc.). For example (this is my *helm-chart/values.yaml):

replicaCount: 1

image:
   repository:dkr.ecr.us-east-1.amazonaws.com/minta-user
   tag: 1.0.8

You need to make sure that the image:1.0.8 is pushed!

Tannenberg answered 7/5, 2019 at 11:11 Comment(1)
where do you find this file?Whitnell
T
0

Run the below command:

eval $(minikube -p minikube docker-env)

Now build your images. Then use the same images in Kubernetes. Do this every time when you open new command line window.

Thermoelectric answered 12/3, 2021 at 13:52 Comment(1)
In what context? Linux? Inside the Docker container? Somewhere else?Gelt
A
0

In my case, using a Fargate profile, I had the networking in my VPC configured incorrectly. The Fargate containers require access to ECR, which requires a route to the public Internet.

I had the NAT Gateways for my private subnets located in those same private subnets, when they should have been located in public subnets. This error message was the result of that misconfiguration in my case.

Anselme answered 2/7, 2021 at 13:37 Comment(1)
how would this happen on one of your nodes?Rheumatism
M
0

Make sure your repo is publicly accessible. Mine was set as private and giving "ImagePullBackOff" status.

Melisent answered 28/2, 2022 at 16:15 Comment(1)
how could this happen on one node?Rheumatism
S
0

if the image is not compatible, you will also get Back-off pulling image environment. it happened in my case.

I have verified by doing docker pull on the k8s node.

Sibylle answered 1/6, 2022 at 13:13 Comment(0)
H
0

had the same issue when using the kubernetes cluster of docker desktop. first I got "ErrImagePull", then "ImagePullBackOff". Error details were "rpc error: code = unknown desc = context deadline exceeded". I was able to resolve it by using docker pull <imageNameOfContainerWithinFailingPod> on my machine. I guess something times out while k8s pulls the image.

Hendecasyllable answered 16/3, 2023 at 17:15 Comment(0)
T
-1

For Ubuntu Labs Go to your worker node then edit the following file

sudo vi /etc/resolv.conf

nameserver 8.8.8.8 Make this change and save it it will work for labs

Threeply answered 31/3, 2022 at 4:5 Comment(0)
S
-4

I was facing the similar problem, but instead of one all of my pods were not ready and displaying Ready status 0/1

Something like:

Enter image description here

I tried a lot of things, but at last I found that the context was not correctly set.

Please use the following command and ensure you are in the correct context:

kubectl config get-contexts
Shakespeare answered 1/7, 2020 at 4:45 Comment(1)
Your context only governs which cluster you're connected to. This answer is incorrect.Seminole
O
-13

Steps:

  • Run docker login.

  • Push the image to Docker Hub

  • Recreate the pod

This solved the problem for me.

Oralee answered 11/11, 2018 at 17:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.