How can I get CoreDNS to resolve on my Raspberry Pi Kubernetes cluster?
Asked Answered
S

2

10

I've followed a number of online tutorials to set up a Kubernetes cluster on four Raspberry Pi 4s. I ended up using Flannel as the networking plugin as that seems to be the only one that actually works on RPi, with a pod network CIDR of 10.244.0.0/16, per this guide from 2017. Most everything is working... all of the base pods in the kube-system namespace are running/healthy, and I can pull down images and launch new containers. At first I wasn't able to get any pod logs, but that was quickly remedied by opening up port 10250 on each node.

But there still seems to be a problem DNS resolution. I should clarify that DNS resolution on the hosts clearly does work, as the cluster is able to download any container image I specify. But once a container is running, it isn't able to "dial out" to anything. As a test, I'm running the arm32v7/buildpack-deps:latest container in a pod. It pulls the image from Docker hub just fine. But when I shell into it and simply type curl https://www.google.com it hangs before eventually timing out. And the same is true of any pod I launch that needs to interact with the external Internet: they hang and hang and hang.

Here are all the networking-related commands I've already run on each node:

sudo iptables -P FORWARD ACCEPT
sudo iptables -A FORWARD -i cni0 -j ACCEPT
sudo iptables -A FORWARD -o cni0 -j ACCEPT
sudo ufw allow ssh
sudo ufw allow 443  # can't remember why i ran this one
sudo ufw allow 6443
sudo ufw allow 8080 # this one might not be strictly necessary, either
sudo ufw allow 10250
sudo ufw default allow routed
sudo ufw enable

I'm not entirely sure that the last two iptables commands did anything; I grabbed them from the comment section of that guide I linked to earlier. I know that guide assumes one is using kube-dns but it's also 3 years old so I am using the (newer) default, coredns, instead.

What am I missing? I feel like I'm so close to having this cluster fully operational, but obviously I need functioning DNS!

UPDATE: I know that it's a DNS problem, and not general Internet connectivity, for two reasons: (1) the cluster itself can pull down any image I specify from Dockerhub, and (2) when I shell into a running container that has curl and execute curl -H "Host: www.google.com" 142.250.73.206, it successfully returns the Google homepage HTML. But as mentioned if I try and do my earlier curl command using the hostname, that times out.

Scour answered 12/11, 2020 at 13:5 Comment(9)
This discussion might help: github.com/kubernetes/kubernetes/issues/44833Divergency
That is interesting/promising! When I look at the contents of /etc/resolv.conf on a running container, it is indeed pointing at 10.96.0.10. The command I had originally used to launch the cluster was this one: sudo kubeadm init --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.1.194 (because 192.168.1.194 is the local IP of my master node RPI). Based on what that comment says, it sounds like I need to add an extra option to that: --cluster-dns=100.64.0.10. Does that sound correct to you @ofirule? Or should I use my 192.168.1.194 IP instead?Scour
According to github.com/kubernetes/kubernetes/issues/33881 the kube-dns service is part of the network-cidr and probably goes to 10.244.0.10 , but this is something you have to check.Divergency
I don't think that's it. When I shell into a running container and type dig @10.244.0.10 google.com it still times out.Scour
I guess cluster-dns=8.8.8.8 would work. But this is not the kubernetes wayDivergency
I'm sure that would work but I put a bounty on this question because I'd really love to know what is the Kubernetes way. How can I determine which internal IP my cluster is using for DNS? And how can I get it working? (Do I need to allow any additional ports with ufw?)Scour
Plus I don't think that would work anyway for anything other than external DNS. I would lose the ability for pods to talk to services if they always defaulted to 8.8.8.8, wouldn't I?Scour
The configuration inside of pods seems correct for me. Is the core-dns running? (kubectl get pods -n kube-system)Flesher
Yes, coredns has two pods running. If I add -o wide to your command to see more I can tell you that the two pods were assigned internal IPs 10.244.0.4 and 10.244.0.5, respectively. And they seem stable - they haven't restarted. They're both running on the master node. Unfortunately if I try running dig @10.244.0.4 google.com from inside a container that times out too :(Scour
C
0
  1. Create a simple Pod to use as a test environment for DNS diagnosing:
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
kubectl apply -f dnsutils.yaml
  1. Check the status of Pod
$ kubectl get pods dnsutils
NAME      READY     STATUS    RESTARTS   AGE
dnsutils   1/1       Running   0          <some-time>

Once that Pod is running, you can exec nslookup in that environment. If you see something like the following, DNS is working correctly.

$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default

Server:    10.0.0.10
Address 1: 10.0.0.10

Name:      kubernetes.default
Address 1: 10.0.0.1

If the nslookup command fails, check the following:

  1. Take a look inside the resolv.conf file.
kubectl exec -ti dnsutils -- cat /etc/resolv.conf

Verify that the search path and name server are set up like the following (note that search path may vary for different cloud providers):

search default.svc.cluster.local svc.cluster.local cluster.local google.internal c.gce_project_id.internal
nameserver 10.0.0.10
options ndots:5

Errors such as the following indicate a problem with the CoreDNS (or kube-dns) add-on or with associated Services:

$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default

Server:    10.0.0.10
Address 1: 10.0.0.10

nslookup: can't resolve 'kubernetes.default'

OR

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
  1. Check if the DNS pod is running
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

NAME                       READY     STATUS    RESTARTS   AGE
...
coredns-7b96bf9f76-5hsxb   1/1       Running   0           1h
coredns-7b96bf9f76-mvmmt   1/1       Running   0           1h
...
  1. Check for errors in the DNS pod Here is an example of a healthy CoreDNS log:
$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns

.:53
2018/08/15 14:37:17 [INFO] CoreDNS-1.2.2
2018/08/15 14:37:17 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.2
linux/amd64, go1.10.3, 2e322f6
2018/08/15 14:37:17 [INFO] plugin/reload: Running configuration MD5 = 24e6c59e83ce706f07bcc82c31b1ea1c
  1. Verify that the DNS service is up by using the kubectl get service command.
$ kubectl get svc --namespace=kube-system

NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
...
kube-dns     ClusterIP   10.0.0.10      <none>        53/UDP,53/TCP        1h
...
  1. You can verify that DNS endpoints are exposed by using the kubectl get endpoints command.
$ kubectl get endpoints kube-dns --namespace=kube-system

NAME       ENDPOINTS                       AGE
kube-dns   10.180.3.17:53,10.180.3.17:53    1h
  1. You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns. To edit it, use the command:
$ kubectl -n kube-system edit configmap coredns

Then add log in the Corefile section per the example below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        log
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }

After saving the changes, it may take up to minute or two for Kubernetes to propagate these changes to the CoreDNS pods. Next, make some queries and view the logs per the sections above in this document. If CoreDNS pods are receiving the queries, you should see them in the logs.

Here is an example of a query in the log:

.:53
2018/08/15 14:37:15 [INFO] CoreDNS-1.2.0
2018/08/15 14:37:15 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.0
linux/amd64, go1.10.3, 2e322f6
2018/09/07 15:29:04 [INFO] plugin/reload: Running configuration MD5 = 162475cdf272d8aa601e6fe67a6ad42f
2018/09/07 15:29:04 [INFO] Reloading complete
172.17.0.18:41675 - [07/Sep/2018:15:29:11 +0000] 59925 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000066649s
Codi answered 21/11, 2020 at 2:23 Comment(9)
What's up Iran IT afshin? You should be able to use 1 [dot] "." space, 2., 3., ..., 8. so just number dot space or #. for the itemized items listed.Ununa
Fails right at Step 2 with this response from the nslookup command: ;; connection timed out; no servers could be reachedScour
Here's what /etc/resolv.conf looks like: nameserver 10.96.0.10 search default.svc.cluster.local svc.cluster.local cluster.local options ndots:5Scour
My coredns pod logs have multiple references to "Reflector ListAndWatch". The most recent log message says this: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": net/http: TLS handshake timeout plugin/ready: Still waiting on: "kubernetes"Scour
In fact there are a lot of "Failed to list ..." logs that all reference 10.96.0.1 with a timeout. I'm guessing that it's pointing at an invalid IP or something? I'm not sure what anything in the 10.96.0.xx range is supposed to be.Scour
What's the output of this command? kubectl get svc --namespace=kube-systemCodi
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 11dScour
Well, looks like your coredns pods have problems talking to API server. Could you add kubectl get pod -A -o wide and kubectl get events -A output to your post? Also can you see enything suspicious in API server logs?Martyrology
Also check kubectl describe node for resource pressure events.Martyrology
F
0

As pointed out in the comments: The configuration of kubeadm seems fine.
Your pods have the correct /etc/resolv.conf and they should work.

It's pretty hard to clarily determine the problem - many things can be happend here.
My guess: There something not right with ufw.
You can easily proof it: Disable ufw on all nodes (with ufw disable).

I'm not hundred percent sure which ports are needed. I'm using iptables for my single node k8s and at the start I had many problems FORWARD vs INPUT rules. In docker all ports are forwarded.
So I guess there is something wrong with FORWARD-rules and/or the dns-ports (53/udp and 53/tcp).

Good luck.

Flesher answered 21/11, 2020 at 21:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.