Kubernetes DNS lookg not working from worker node - connection timed out; no servers could be reached

Asked 15/1, 2021 at 22:49 Answered 19/10, 2022 at 20:38

I have build new Kubernetes cluster v1.20.1 single master and single node with Calico CNI.

I deployed the busybox pod in default namespace.

# kubectl get pods busybox -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
busybox   1/1     Running   0          12m   10.203.0.129   node02   <none>           <none>

nslookup not working

kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'

cluster is running RHEL 8 with latest update

followed this steps: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

nslookup command not able to reach nameserver

# kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached

command terminated with exit code 1

resolve.conf file

# kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local 
nameserver 10.96.0.10
options ndots:5

DNS pods running

# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                      READY   STATUS    RESTARTS   AGE
coredns-74ff55c5b-472vx   1/1     Running   1          85m
coredns-74ff55c5b-c75bq   1/1     Running   1          85m

DNS pod logs

 kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d

Service is defined

# kubectl get svc --namespace=kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   86m

**I can see the endpoints of DNS pod**

# kubectl get endpoints kube-dns --namespace=kube-system
NAME       ENDPOINTS                                               AGE
kube-dns   10.203.0.5:53,10.203.0.6:53,10.203.0.5:53 + 3 more...   86m

enabled the logging, but didn't see traffic coming to DNS pod

# kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d

I can ping DNS POD

# kubectl exec -i -t dnsutils -- ping 10.203.0.5
PING 10.203.0.5 (10.203.0.5): 56 data bytes
64 bytes from 10.203.0.5: seq=0 ttl=62 time=6.024 ms
64 bytes from 10.203.0.5: seq=1 ttl=62 time=6.052 ms
64 bytes from 10.203.0.5: seq=2 ttl=62 time=6.175 ms
64 bytes from 10.203.0.5: seq=3 ttl=62 time=6.000 ms
^C
--- 10.203.0.5 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 6.000/6.062/6.175 ms

nmap show port filtered

# ke netshoot-6f677d4fdf-5t5cb -- nmap 10.203.0.5
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:29 UTC
Nmap scan report for 10.203.0.5
Host is up (0.0060s latency).
Not shown: 997 closed ports
PORT     STATE    SERVICE
53/tcp   filtered domain
8080/tcp filtered http-proxy
8181/tcp filtered intermapper

Nmap done: 1 IP address (1 host up) scanned in 14.33 seconds

If I schedule the POD on master node, nslookup works nmap show port open?

# ke netshoot -- bash
bash-5.0# nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

 nmap -p 53 10.96.0.10
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:46 UTC
Nmap scan report for kube-dns.kube-system.svc.cluster.local (10.96.0.10)
Host is up (0.000098s latency).

PORT   STATE SERVICE
53/tcp open  domain

Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds

Why nslookup from POD running on worker node is not working? how to troubleshoot this issue?

I re-build the server two times, still same issue.

Thanks

Update adding kubeadm config file

# cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  kubeletExtraArgs:
    cgroup-driver: "systemd"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "master01:6443"
networking:
  dnsDomain: cluster.local
  podSubnet: 10.0.0.0/14
  serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs

Reverberatory answered 15/1, 2021 at 22:49 Comment(14)

Not an expert, but wondering if there is any netpol messing up ? – Faus 18/1, 2021 at 2:20

there are several busybox images with problems when it comes to their installed nettools. can you try again with image busybox:1.28? it might very well not be the issue, but if it is this will help you save a lot of troubleshooting time – Kerosene 18/1, 2021 at 9:15

meaningqo is right. You can find out more about this issue on github – Saturation 18/1, 2021 at 9:46

tried with busybox:1.28 still same error nslookup: can't resolve 'kubernetes.default' if I run busybox on master nslookup works, from worker node its not working. – Reverberatory 18/1, 2021 at 12:48

How did you deploy Calico ? Can you check statuses and logs of your Calico Pods (especially the one from your worker node) ? – Saturation 18/1, 2021 at 14:58

@Saturation I checked both calico-node-6tbwh and `kube-proxy-5l2qb' log no error message. since I was able ping the dns server, network route is good I think. TCP traffic is not reaching worker node to master node. – Reverberatory 18/1, 2021 at 16:26

Doesn't the pods-network-cidr overlap with the IP range of your VMs ? – Saturation 18/1, 2021 at 17:11

no, my host network is 192.x pod net work 10.x. also ping works from pod to pod on different host – Reverberatory 18/1, 2021 at 18:44

Do you allow connections to TCP and UDP port 53 for DNS access ? Did you disable SELinux ? Did you bootstrap k8s using kubeadm with default settings + custom pod-network cidr? Is it possible for you to share your exact pod-network-cidr and service-cidr ? Do you have any additional cluster configuration e.g. k8s network policy, calico network policy ? Do you meet these requirements ? Did you follow official docs about installing and configuration k8s with rhel 8? – Saturation 20/1, 2021 at 13:15

@Saturation Yes, SELinux and firewall is disabled, adeed kueconfig file used to setup this cluster. same configuration works with RHEL 7, issue with only RHLE 8. – Reverberatory 21/1, 2021 at 1:40

Any progress on this ? Stuck in a similar situation. – Consistence 26/1, 2021 at 7:37

@Consistence sorry, didn't find any fix for this. I tried Alan Reed post step, it didn't work. I also posted another question but didn't get any answer. #65756490 – Reverberatory 26/1, 2021 at 19:15

Do you think it could be the version? I am also on v1.20 but saw this post #63653789 and I am thinking of downgrading to v1.19 .what do you think? – Consistence 27/1, 2021 at 4:16

I am also v1.20.1, not sure its issue with kubernetes, I changed to RHEL7 with v1.20.1, nslookup works. – Reverberatory 28/1, 2021 at 1:33

First of all, according to the docs - please note that Calico and kubeadm support Centos/RHEL 7+.
In both Calico and kubeadm documentation we can see that they only support RHEL7+.

By default RHEL8 uses nftables instead of iptables ( we can still use iptables but "iptables" on RHEL8 is actually using the kernel's nft framework in the background - look at "Running Iptables on RHEL 8").

9.2.1. nftables replaces iptables as the default network packet filtering framework

I believe that nftables may cause this network issues because as we can find on nftables adoption page:

Kubernetes does not support nftables yet.

Note: For now I highly recommend you to use RHEL7 instead of RHEL8.

With that in mind, I'll present some information that may help you with RHEL8.
I have reproduced your issue and found a solution that works for me.

First I opened ports required by Calico - these ports can be found here under "Network requirements".
As workaround:
Next I reverted to the old iptables backend on all cluster nodes, you can easily do so by setting FirewallBackend in /etc/firewalld/firewalld.conf to iptables as described
here.
Finally I restarted firewalld to make the new rules active.

I've tried nslookup from Pod running on worker node (kworker) and it seems to work correctly.

root@kmaster:~# kubectl get pod,svc -o wide
NAME      READY   STATUS    RESTARTS   AGE    IP           NODE      NOMINATED NODE   READINESS GATES
pod/web   1/1     Running   0          112s   10.99.32.1   kworker   <none>           <none>

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/kubernetes   ClusterIP   10.99.0.1    <none>        443/TCP   5m51s   <none>
root@kmaster:~# kubectl exec -it web -- bash
root@web:/# nslookup kubernetes.default
Server:         10.99.0.10
Address:        10.99.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.99.0.1

root@web:/#

Saturation answered 1/2, 2021 at 11:39 Comment(5)

thanks for the info. let me try this and update the status – Reverberatory 2/2, 2021 at 2:16

I have changed to RHEL 7 – Reverberatory 26/2, 2021 at 19:1

another post with work around for NFT mihail-milev.medium.com/… – Reverberatory 11/3, 2021 at 2:45

based on this document its support RHEL 8 github.com/kubernetes/kops/blob/master/docs/operations/… – Reverberatory 27/3, 2021 at 17:17

how to fix this issue on ubuntu20? – Flub 26/4, 2022 at 3:56

I ran into the same issue setting up a vanilla kubeadm 1.25 cluster on RHEL8 and @matt_j's answer lead me to another solution that avoids nftables by using ipvs mode in kube-proxy.

Just modify the kube-proxy ConfigMap in kube-system namespace so the config.conf file has this value;

...
data:
  config.conf:
    ...
    mode: "ipvs"
...

And ensure kube-proxy or your nodes are restarted.

Siberia answered 19/10, 2022 at 20:38 Comment(0)

In my situation, we're using the K3S cluster. And the new agent couldn't make the default(ClusterFirst) DNS query. After lots of research, I found I need to change the kube-proxy cluster-cidr args to make the DNS work successfully.

Hope this info is useful for others.

Stockpile answered 5/12, 2021 at 9:46 Comment(2)

Could you elaborate on what you mean by using those arguments to make DNS work? – Emmott 18/7, 2022 at 4:26

I guess it needs to specify the Cluster CIDR to let the kube-proxy know the correct IP range, especially since there are multiple network cards in the instance. – Stockpile 19/7, 2022 at 2:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Update adding kubeadm config file

Recommended topics

Hot tags