Kubernetes pod never gets ready
Asked Answered
V

1

6

I am setting up a small Kubernetes cluster using a VM (master) and 3 bare metal servers (all running Ubuntu 14.04). I followed the Kubernetes install tutorial for Ubuntu. Each bare metal server also has 2T of disk space exported using Ceph 0.94.5. Everything is working fine, but when I try to start a Replication Controller I get the following (kubectl get pods):

NAME          READY     STATUS                                         RESTARTS   AGE
site2-zecnf   0/1       Image: site-img is ready, container is creating    0      12m

The pod will be in this Not Ready state forever, but, if I kill it and start it again, it will run fine (sometimes I have to repeat this operation a few times though). Once the pod is running, everything works just fine.

If, for some reason, the pod dies, it's restarted by Kubernetes, but can enter in this Not Ready state again. Running:

kubectl describe pod java-site2-crctv

I get (some fields deleted):

Namespace:          default
Status:             Pending
Replication Controllers:    java-site2 (1/1 replicas created)
Containers:
  java-site:
    Image:      javasite-img
    State:      Waiting
      Reason:       Image: javasite-img is ready, container is creating
    Ready:      False
    Restart Count:  0
Conditions:
  Type      Status
  Ready     False 
Events:
  FirstSeen             LastSeen            Count   From            SubobjectPath   Reason      Message
  Sat, 14 Nov 2015 12:37:56 -0200   Sat, 14 Nov 2015 12:37:56 -0200 1   {scheduler }                scheduled   Successfully assigned java-site2-crctv to 10.70.2.3
  Sat, 14 Nov 2015 12:37:57 -0200   Sat, 14 Nov 2015 12:45:29 -0200 46  {kubelet 10.70.2.3}         failedMount Unable to mount volumes for pod "java-site2-crctv_default": exit status 22
  Sat, 14 Nov 2015 12:37:57 -0200   Sat, 14 Nov 2015 12:45:29 -0200 46  {kubelet 10.70.2.3}         failedSync  Error syncing pod, skipping: exit status 22

The pod cannot mount the volume. But, if I mount the volumes (rdb blocks) by hand in a local folder in all nodes, the problem is gone (pods start without problems).

It seems to me that Kubernetes isn't able to map them (sudo rbd map java-site-vol), only to mount them (sudo mount /dev/rbd/rbd/java-site-vol /...).

Should I map all Ceph volumes that I use or should Kubernetes do that?

Valiant answered 12/11, 2015 at 15:56 Comment(5)
Have you tried running kubectl describe pod on the pod in the Not Ready state? It may be clear from the event stream for the pod what is preventing it from running. Alternatively, you should look at /var/log/kubelet.log on the host where the pod is stuck in the Not Ready state to see if there is anything interesting in the logs.Disband
I run the command and updated the question. Thanks.Valiant
I have had a similar issue: I believe it is related to a bug being worked on right now, where a failed container won't restart unless it is on the same node it originally started on, because the original storage mount is not available on a different node.Hobard
I am using Ceph and pods can mount blocks from any node.Valiant
I answered the question. But I think the fact that the pods didn't always start was because the Ceph volumes were mounted (and mapped) outside Kubernetes in some nodes (but not in others). Should I delete this part of the question to make the whole question clearer?Valiant
V
5

I finally solved the problem. In the yaml files describing the Replication Controllers, I was using keyring: in the volume section:

keyring: "ceph.client.admin.keyring" 

After I generated a Ceph secret and changed the yaml files to use secretRef:

secretRef:
  name: "ceph-secret"

Kubernetes was able to map and mount the Ceph volumes and the pods began to start normally. I don't know why using keyring: doesn't work in this case.

Valiant answered 16/11, 2015 at 2:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.