NFS volume mount results in exit code 32 on in Kubernetes?
Asked Answered
N

1

1

I'm trying to mount an external nfs share in a Replication Controller. When I create the replication controller, the pod is pending. Getting the details on the pod, I get these events:

Events:
  FirstSeen             LastSeen            Count   From            SubobjectPath   Reason      Message
  Thu, 05 Nov 2015 11:28:33 -0700   Thu, 05 Nov 2015 11:28:33 -0700 1   {scheduler }                scheduled   Successfully assigned web-worker-hthjq to jolt-server-5
  Thu, 05 Nov 2015 11:28:43 -0700   Thu, 05 Nov 2015 11:28:43 -0700 1   {kubelet jolt-server-5}         failedMount Unable to mount volumes for pod "web-worker-hthjq_default": exit status 32
  Thu, 05 Nov 2015 11:28:43 -0700   Thu, 05 Nov 2015 11:28:43 -0700 1   {kubelet jolt-server-5}         failedSync  Error syncing pod, skipping: exit status 32

My set up is one master and one node on local machines. These machines are running CoreOS. The nfs share exists on another machine on the network. If I shell into the host machine, I can successfully mount the nfs share, so I believe the export is configured correctly. Looking around online, it seems like the only examples of using nfs shares are those defined within Kubernetes (one pod sharing with another). Is there not a way to mount an external share directly from Kubernetes (I don't want to have to mount it to the host machine and then mount from the host machine to the container).

Here's my Replication Controller:

apiVersion: v1
kind: ReplicationController
metadata:
  labels:
    name: web-worker
  name: web-worker
spec:
  replicas: 1
  selector:
    component: web-worker
  template:
    metadata:
      labels:
        app: task-queue
        component: web-worker
    spec:
      containers:
        - command:
            - /sbin/my_init
          image: quincy/php-apache2:latest
          name: web-worker
          ports:
            - containerPort: 80
              protocol: TCP
          volumeMounts:
            - name: code-base
              mountPath: /var/www/html
            - name: local-secrets
              mountPath: /secrets
      volumes:
        - name: code-base
          nfs:
            server: thuey.jolt.local
            path: "/Users/thuey/Sites/jolt/jolt-web"
        - name: local-secrets
          secret:
            secretName: local-secret

UPDATE

After thinking about it more, I realized the problem is probably that it can't find the server (thuey.jolt.local). It's probably just looking at the internal DNS. Is that accurate? If so, is there any way around that?

UPDATE

After attempting this again, it is now working mounting directly from the pod to the networked nfs server.

Nessus answered 5/11, 2015 at 18:38 Comment(6)
Does your kubelet log show anything more specific than the pod event log above? That'd be the next place I'd check, I think. The error isn't specific on which mount failed. You might also try taking out the 'secret' mount while troubleshooting as part of isolating the issue. Sorry, I don't have an actual answer.Karlis
I have removed the secret mount, and it still had the error. I've also just used the secret mount, and it worked. What's the best way to check the kubelete log? I tried finding /var/log/kubelet.log on the host machine, but it didn't exist.Nessus
Te see the kubelet log on a systemd-based host (like CoreOS), this should work: journalctl -lu kubelet.serviceKarlis
You could verify how the pod is attempting to look up the NFS server with 'kubectl exec web-worker-hthjq -- cat /etc/resolv.conf' and if you have 'nslookup or a similar command in the container: 'kubectl exec web-worker-hthjq -- nslookup thuey.jolt.local' Usually the kubelet sets up resolv.conf to point at the skydns addon (if it is configured) and then add the worker node's own resolv.conf data to the containers. So if your host can lookup and route to the NFS server, I'd expect the container to be set up for that as well.Karlis
Checking resolv.conf confirmed that it's looking at the local DNS server, so that's good. Also, nslookup is successful. I don't know if this is related or not, but I noticed an error on the CoreOS host machine that I hadn't noticed before. When shelling in, it says that the locksmithd.service failed. Checking journalctl shows: locksmithd.service: Failed with result 'resources' So far, Google hasn't turned up much.Nessus
I attempted mounting directly from the pod again, and now it's working. I'm not sure what changed. But I really appreciate your help. Those commands were really useful.Nessus
N
2

With @rwehner's help, I was finally able to figure it out. Checking the kubelet log revealed:

Output: mount.nfs: rpc.statd is not running but is required for remote locking.

As soon as I got rpcbind running, the mount worked.

Nessus answered 9/11, 2015 at 23:55 Comment(3)
Where did you get statd running? Inside the pod container?Manta
@FrankerZ It's been a while, but yes, if I remember correctly I had to get it running in the pod containerNessus
For reference if anyone else comes here: serverfault.com/questions/887212/…Manta

© 2022 - 2024 — McMap. All rights reserved.