Kubernetes DNS lookg not working from worker node - connection timed out; no servers could be reached
Asked Answered
R

3

13

I have build new Kubernetes cluster v1.20.1 single master and single node with Calico CNI.

I deployed the busybox pod in default namespace.

# kubectl get pods busybox -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
busybox   1/1     Running   0          12m   10.203.0.129   node02   <none>           <none>

 

nslookup not working

kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'

cluster is running RHEL 8 with latest update

followed this steps: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

nslookup command not able to reach nameserver

# kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached

command terminated with exit code 1

resolve.conf file

# kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local 
nameserver 10.96.0.10
options ndots:5

DNS pods running

# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                      READY   STATUS    RESTARTS   AGE
coredns-74ff55c5b-472vx   1/1     Running   1          85m
coredns-74ff55c5b-c75bq   1/1     Running   1          85m

DNS pod logs

 kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d

Service is defined

# kubectl get svc --namespace=kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   86m

**I can see the endpoints of DNS pod**

# kubectl get endpoints kube-dns --namespace=kube-system
NAME       ENDPOINTS                                               AGE
kube-dns   10.203.0.5:53,10.203.0.6:53,10.203.0.5:53 + 3 more...   86m

enabled the logging, but didn't see traffic coming to DNS pod

# kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d

I can ping DNS POD

# kubectl exec -i -t dnsutils -- ping 10.203.0.5
PING 10.203.0.5 (10.203.0.5): 56 data bytes
64 bytes from 10.203.0.5: seq=0 ttl=62 time=6.024 ms
64 bytes from 10.203.0.5: seq=1 ttl=62 time=6.052 ms
64 bytes from 10.203.0.5: seq=2 ttl=62 time=6.175 ms
64 bytes from 10.203.0.5: seq=3 ttl=62 time=6.000 ms
^C
--- 10.203.0.5 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 6.000/6.062/6.175 ms

nmap show port filtered

# ke netshoot-6f677d4fdf-5t5cb -- nmap 10.203.0.5
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:29 UTC
Nmap scan report for 10.203.0.5
Host is up (0.0060s latency).
Not shown: 997 closed ports
PORT     STATE    SERVICE
53/tcp   filtered domain
8080/tcp filtered http-proxy
8181/tcp filtered intermapper

Nmap done: 1 IP address (1 host up) scanned in 14.33 seconds

If I schedule the POD on master node, nslookup works nmap show port open?

# ke netshoot -- bash
bash-5.0# nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

 nmap -p 53 10.96.0.10
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:46 UTC
Nmap scan report for kube-dns.kube-system.svc.cluster.local (10.96.0.10)
Host is up (0.000098s latency).

PORT   STATE SERVICE
53/tcp open  domain

Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds

Why nslookup from POD running on worker node is not working? how to troubleshoot this issue?

I re-build the server two times, still same issue.

Thanks

SR

Update adding kubeadm config file

# cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  kubeletExtraArgs:
    cgroup-driver: "systemd"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "master01:6443"
networking:
  dnsDomain: cluster.local
  podSubnet: 10.0.0.0/14
  serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs

"

Reverberatory answered 15/1, 2021 at 22:49 Comment(14)
Not an expert, but wondering if there is any netpol messing up ?Faus
there are several busybox images with problems when it comes to their installed nettools. can you try again with image busybox:1.28? it might very well not be the issue, but if it is this will help you save a lot of troubleshooting timeKerosene
meaningqo is right. You can find out more about this issue on githubSaturation
tried with busybox:1.28 still same error nslookup: can't resolve 'kubernetes.default' if I run busybox on master nslookup works, from worker node its not working.Reverberatory
How did you deploy Calico ? Can you check statuses and logs of your Calico Pods (especially the one from your worker node) ?Saturation
@Saturation I checked both calico-node-6tbwh and `kube-proxy-5l2qb' log no error message. since I was able ping the dns server, network route is good I think. TCP traffic is not reaching worker node to master node.Reverberatory
Doesn't the pods-network-cidr overlap with the IP range of your VMs ?Saturation
no, my host network is 192.x pod net work 10.x. also ping works from pod to pod on different hostReverberatory
Do you allow connections to TCP and UDP port 53 for DNS access ? Did you disable SELinux ? Did you bootstrap k8s using kubeadm with default settings + custom pod-network cidr? Is it possible for you to share your exact pod-network-cidr and service-cidr ? Do you have any additional cluster configuration e.g. k8s network policy, calico network policy ? Do you meet these requirements ? Did you follow official docs about installing and configuration k8s with rhel 8?Saturation
@Saturation Yes, SELinux and firewall is disabled, adeed kueconfig file used to setup this cluster. same configuration works with RHEL 7, issue with only RHLE 8.Reverberatory
Any progress on this ? Stuck in a similar situation.Consistence
@Consistence sorry, didn't find any fix for this. I tried Alan Reed post step, it didn't work. I also posted another question but didn't get any answer. #65756490Reverberatory
Do you think it could be the version? I am also on v1.20 but saw this post #63653789 and I am thinking of downgrading to v1.19 .what do you think?Consistence
I am also v1.20.1, not sure its issue with kubernetes, I changed to RHEL7 with v1.20.1, nslookup works.Reverberatory
S
8

First of all, according to the docs - please note that Calico and kubeadm support Centos/RHEL 7+.
In both Calico and kubeadm documentation we can see that they only support RHEL7+.

By default RHEL8 uses nftables instead of iptables ( we can still use iptables but "iptables" on RHEL8 is actually using the kernel's nft framework in the background - look at "Running Iptables on RHEL 8").

9.2.1. nftables replaces iptables as the default network packet filtering framework

I believe that nftables may cause this network issues because as we can find on nftables adoption page:

Kubernetes does not support nftables yet.

Note: For now I highly recommend you to use RHEL7 instead of RHEL8.


With that in mind, I'll present some information that may help you with RHEL8.
I have reproduced your issue and found a solution that works for me.

  • First I opened ports required by Calico - these ports can be found here under "Network requirements".
    As workaround:
  • Next I reverted to the old iptables backend on all cluster nodes, you can easily do so by setting FirewallBackend in /etc/firewalld/firewalld.conf to iptables as described
    here.
  • Finally I restarted firewalld to make the new rules active.

I've tried nslookup from Pod running on worker node (kworker) and it seems to work correctly.

root@kmaster:~# kubectl get pod,svc -o wide
NAME      READY   STATUS    RESTARTS   AGE    IP           NODE      NOMINATED NODE   READINESS GATES
pod/web   1/1     Running   0          112s   10.99.32.1   kworker   <none>           <none>

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/kubernetes   ClusterIP   10.99.0.1    <none>        443/TCP   5m51s   <none>
root@kmaster:~# kubectl exec -it web -- bash
root@web:/# nslookup kubernetes.default
Server:         10.99.0.10
Address:        10.99.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.99.0.1

root@web:/#
Saturation answered 1/2, 2021 at 11:39 Comment(5)
thanks for the info. let me try this and update the statusReverberatory
I have changed to RHEL 7Reverberatory
another post with work around for NFT mihail-milev.medium.com/…Reverberatory
based on this document its support RHEL 8 github.com/kubernetes/kops/blob/master/docs/operations/…Reverberatory
how to fix this issue on ubuntu20?Flub
S
2

I ran into the same issue setting up a vanilla kubeadm 1.25 cluster on RHEL8 and @matt_j's answer lead me to another solution that avoids nftables by using ipvs mode in kube-proxy.

Just modify the kube-proxy ConfigMap in kube-system namespace so the config.conf file has this value;

...
data:
  config.conf:
    ...
    mode: "ipvs"
...

And ensure kube-proxy or your nodes are restarted.

Siberia answered 19/10, 2022 at 20:38 Comment(0)
S
0

In my situation, we're using the K3S cluster. And the new agent couldn't make the default(ClusterFirst) DNS query. After lots of research, I found I need to change the kube-proxy cluster-cidr args to make the DNS work successfully.

Hope this info is useful for others.

Stockpile answered 5/12, 2021 at 9:46 Comment(2)
Could you elaborate on what you mean by using those arguments to make DNS work?Emmott
I guess it needs to specify the Cluster CIDR to let the kube-proxy know the correct IP range, especially since there are multiple network cards in the instance.Stockpile

© 2022 - 2024 — McMap. All rights reserved.