Kubernetes DNS lookup issue and "invalid" in the /etc/resolv.conf file
Asked Answered
H

1

7

I have deployed a Kubernetes cluster composed of a master and two workers using kubeadm and the Flannel network driver (So I passed the --pod-network-cidr=10.244.0.0/16 flag to kubeadm init).

Those nodes are communicating together using a VPN so that:

  • Master node IP address is 10.0.0.170
  • Worker 1 IP address is 10.0.0.247
  • Worker 2 IP address is 10.0.0.35

When I create a new pod and I try to ping google I have the following error:

/ # ping google.com
ping: bad address 'google.com'

I followed the instructions from the Kubernetes DNS debugging resolution documentation page:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

Check the local DNS configuration first

$ kubectl exec busybox cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local invalid
options ndots:5

Check if the DNS pod is running

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-5c98db65d4-cqzb7   1/1     Running   0          7d18h
coredns-5c98db65d4-xc5d7   1/1     Running   0          7d18h

Check for Errors in the DNS pod

$ for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done
.:53
2019-10-28T13:40:41.834Z [INFO] CoreDNS-1.3.1
2019-10-28T13:40:41.834Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-10-28T13:40:41.834Z [INFO] plugin/reload: Running configuration MD5 = 5d5369fbc12f985709b924e721217843
.:53
2019-10-28T13:40:42.870Z [INFO] CoreDNS-1.3.1
2019-10-28T13:40:42.870Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-10-28T13:40:42.870Z [INFO] plugin/reload: Running configuration MD5 = 5d5369fbc12f985709b924e721217843

Is DNS service up?

$ kubectl get svc --namespace=kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   7d18h

Are DNS endpoints exposed?

$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                               AGE
kube-dns   10.244.0.3:53,10.244.0.4:53,10.244.0.3:53 + 3 more...   7d18h

Are DNS queries being received/processed?

I made the update to the coredns ConfigMap, ran again the nslookup kubernetes.default command, and here is the result:

$ for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done
.:53
2019-10-28T13:40:41.834Z [INFO] CoreDNS-1.3.1
2019-10-28T13:40:41.834Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-10-28T13:40:41.834Z [INFO] plugin/reload: Running configuration MD5 = 5d5369fbc12f985709b924e721217843
[INFO] Reloading
2019-11-05T08:12:12.511Z [INFO] plugin/reload: Running configuration MD5 = 906291470f7b1db8bef629bdd0056cad
[INFO] Reloading complete
2019-11-05T08:12:12.608Z [INFO] 127.0.0.1:55754 - 7434 "HINFO IN 4808438627636259158.5471394156194192600. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.095189791s
.:53
2019-10-28T13:40:42.870Z [INFO] CoreDNS-1.3.1
2019-10-28T13:40:42.870Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-10-28T13:40:42.870Z [INFO] plugin/reload: Running configuration MD5 = 5d5369fbc12f985709b924e721217843
[INFO] Reloading
2019-11-05T08:12:47.988Z [INFO] plugin/reload: Running configuration MD5 = 906291470f7b1db8bef629bdd0056cad
[INFO] Reloading complete
2019-11-05T08:12:48.004Z [INFO] 127.0.0.1:51911 - 60104 "HINFO IN 4077052818408395245.3902243105088660270. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.016522153s

So it seems that DNS pods are receiving the requests.

But I had this error already!

That error happened to me the first time I deployed the cluster.

At that time, I noticed that kubectl get nodes -o wide was showing the workers node public IP address as "INTERNAL-IP" instead of the private one.

Looking further I found out that on the worker nodes, kubelet was missing the --node-ip flag, so I've added it and restarted Kubelet and the issue was gone. I then concluded that missing flag was the reason, but it seems to not be the case as the kubectl get nodes -o wide command shows the internal IP addresses as "INTERNAL-IP" for the workers.

And now

The DNS server IP address 10.96.0.10 looks wrong to me, and I can't ping it from the pod. The DNS pods have the IP addresses 10.244.0.3 and 10.244.0.4 which I can't ping too.

I just tried to delete the coredns pods, so that they are scheduled again, and now their IP addresses changed, I can ping them from the pod and the kubectl exec -ti busybox -- nslookup kubernetes.default works:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

But the resolv.conf file still has the "invalid" inside:

$ kubectl exec busybox cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local invalid
options ndots:5
  • Can anyone explain me what happened please?
  • And how can I solved this "invalid" from the resolv.conf file?
Hairbreadth answered 5/11, 2019 at 8:37 Comment(2)
After restarting CoreDNS pods did you re-deploy "busybox" pod?Sepoy
No, I didn't restarted it.Hairbreadth
C
9

As configured in CoreDNS ConfigMap default upstream nameservers are inherited from node, that is everything outside the cluster domain (.cluster.local)

So "invalid" is an entry copied from Node's /etc/resolv.conf file during Pod creation.

If you would manually modify /etc/resolv.conf on you Node, every Pod with dnsPolicy: ClusterFirst will inherit /etc/resolv.conf with this modification.

So, after adding --node-ip flag to kubelet and restarting CoreDNS Pods you should re-deploy your busybox Pod so it can inherit /etc/resolv.conf from the Node.

Colleen answered 8/11, 2019 at 15:19 Comment(2)
Thank you @KFC_ for your answer, you are right, the /etc/resolv.conf file on the node contains search invalid. Thank you very much!Hairbreadth
I confirm after fixing the /etc/resolv.conf file on the node itself, deleting the coredns pods, and redeploying the busybox pod, the /etc/resolv.conf file is valid, and the kubectl exec -ti busybox -- nslookup kubernetes.default command works. Again thank you!Hairbreadth

© 2022 - 2024 — McMap. All rights reserved.