Resolving external domains from within pods does not work
Asked Answered
D

2

14

What happened

Resolving an external domain from within a pod fails with SERVFAIL message. In the logs, i/o timeout error is mentioned.

What I expected to happen

External domains should be successfully resolved from the pods.

How to reproduce it

apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
  1. Create the pod above (from Debugging DNS Resolution help page).

  2. Run kubectl exec dnsutils -it -- nslookup google.com

    pig@pig202:~$ kubectl exec dnsutils -it -- nslookup google.com
    Server:      10.152.183.10
    Address: 10.152.183.10#53
    
    ** server can't find google.com.mshome.net: SERVFAIL
    
    command terminated with exit code 1
    
  3. Also run kubectl exec dnsutils -it -- nslookup google.com.

    pig@pig202:~$ kubectl exec dnsutils -it -- nslookup google.com.
    Server:      10.152.183.10
    Address: 10.152.183.10#53
    
    ** server can't find google.com: SERVFAIL
    
    command terminated with exit code 1
    

Additional information

I am using microk8s environment in a Hyper-V virtual machine.

Resolving DNS from the virtual machine works, and Kubernetes is able to pull container images. It's only from within the pods that the resolution is failing meaning I cannot communicate with the Internet from within the pods.

This is OK:

pig@pig202:~$ kubectl exec dnsutils -it -- nslookup kubernetes.default
Server:     10.152.183.10
Address:    10.152.183.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.152.183.1

Environment

The version of CoreDNS

image: 'coredns/coredns:1.6.6'

Corefile (taken from ConfigMap)

  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        log . {
          class error
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4
        cache 30
        loop
        reload
        loadbalance
    }

Logs

pig@pig202:~$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns -f
[INFO] 10.1.99.26:47204 - 29832 "AAAA IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002558s
[ERROR] plugin/errors: 2 grafana.com. AAAA: read udp 10.1.99.19:52008->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:59350 - 50446 "A IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002028s
[ERROR] plugin/errors: 2 grafana.com. A: read udp 10.1.99.19:60405->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:43050 - 13676 "AAAA IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002151s
[ERROR] plugin/errors: 2 grafana.com. AAAA: read udp 10.1.99.19:45624->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:36997 - 30359 "A IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002791s
[ERROR] plugin/errors: 2 grafana.com. A: read udp 10.1.99.19:37554->8.8.4.4:53: i/o timeout
[INFO] 10.1.99.32:57927 - 53858 "A IN google.com.mshome.net. udp 39 false 512" NOERROR - 0 2.0001987s
[ERROR] plugin/errors: 2 google.com.mshome.net. A: read udp 10.1.99.19:34079->8.8.4.4:53: i/o timeout
[INFO] 10.1.99.32:38403 - 36398 "A IN google.com.mshome.net. udp 39 false 512" NOERROR - 0 2.000224s
[ERROR] plugin/errors: 2 google.com.mshome.net. A: read udp 10.1.99.19:59835->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:57447 - 20295 "AAAA IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001892s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. AAAA: read udp 10.1.99.19:51534->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:41052 - 56059 "A IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001879s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. A: read udp 10.1.99.19:47378->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:56748 - 51804 "AAAA IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0003226s
[INFO] 10.1.99.26:45442 - 61916 "A IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001922s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. AAAA: read udp 10.1.99.19:35528->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 grafana.com.mshome.net. A: read udp 10.1.99.19:53568->8.8.8.8:53: i/o timeout

OS

pig@pig202:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"

Tried on Ubuntu 18.04.3 LTS, same issue.

Other

mshome.net search domain comes from Hyper-V network, I assume. Perhaps this will be of help:

pig@pig202:~$ nmcli device show eth0
GENERAL.DEVICE:                         eth0
GENERAL.TYPE:                           ethernet
GENERAL.HWADDR:                         00:15:5D:88:26:02
GENERAL.MTU:                            1500
GENERAL.STATE:                          100 (connected)
GENERAL.CONNECTION:                     Wired connection 1
GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/ActiveConnection/1
WIRED-PROPERTIES.CARRIER:               on
IP4.ADDRESS[1]:                         172.19.120.188/28
IP4.GATEWAY:                            172.19.120.177
IP4.ROUTE[1]:                           dst = 0.0.0.0/0, nh = 172.19.120.177, mt = 100
IP4.ROUTE[2]:                           dst = 172.19.120.176/28, nh = 0.0.0.0, mt = 100
IP4.ROUTE[3]:                           dst = 169.254.0.0/16, nh = 0.0.0.0, mt = 1000
IP4.DNS[1]:                             172.19.120.177
IP4.DOMAIN[1]:                          mshome.net
IP6.ADDRESS[1]:                         fe80::6b4a:57e2:5f1b:f739/64
IP6.GATEWAY:                            --
IP6.ROUTE[1]:                           dst = fe80::/64, nh = ::, mt = 100
IP6.ROUTE[2]:                           dst = ff00::/8, nh = ::, mt = 256, table=255
Denary answered 30/6, 2020 at 19:17 Comment(1)
github.com/coredns/coredns/issues/3982Unicellular
D
15

Finally found the solution which was the combination of two changes. After applying both changes, my pods could finally resolve addresses properly.

Kubelet configuration

Based on known issues, change resolv-conf path for Kubelet to use.

# Add resolv-conf flag to Kubelet configuration
echo "--resolv-conf=/run/systemd/resolve/resolv.conf" >> /var/snap/microk8s/current/args/kubelet
# Restart Kubelet
sudo service snap.microk8s.daemon-kubelet restart

CoreDNS forward

Change forward address in CoreDNS config map from default (8.8.8.8 8.8.4.4) to DNS on eth0 device.

# Dump definition of CoreDNS
microk8s.kubectl get configmap -n kube-system coredns -o yaml > coredns.yaml

Partial content of coredns.yaml:

 Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        log . {
          class error
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4
        cache 30
        loop
        reload
        loadbalance
    }

Fetch DNS:

# Fetch eth0 DNS address (this will print 172.19.120.177 in my case)
nmcli dev show 2>/dev/null | grep DNS | sed 's/^.*:\s*//'

Change the following line and save:

        forward . 8.8.8.8 8.8.4.4 # From this
        forward . 172.19.120.177 # To this (your DNS will probably be different)

Finally apply to change CoreDNS forwarding:

microk8s.kubectl apply -f coredns.yaml
Denary answered 1/7, 2020 at 13:44 Comment(4)
Since you override with --resolv-conf to the real version used by systemd, you can actually use forward . /etc/resolv.conf in the config map to use the host DNS. You don't have to extract it and hard-code it. Took some tinkering because forward . /run/systemd/resolve/resolv.conf does not work, it won't find the file. Also, I had to restart the CoreDNS pod after updating the config map.Churlish
@JakeCobb should that still work? Cause I am trying to do it but as soon as i add /etc/resolv.conf or /run/.../resolv.conf instead of ip, the core-dns pod fails with Readiness probe failed: Get "http://10.1.64.140:8181/ready": dial tcp 10.1.64.140:8181: connect: connection refused. But it works as soon as i add the IP in forward . ipDarton
I had this same issue using an Ubuntu EC2 instance. To get the IP of the nameserver, run resolvectl | grep "Current DNS Server". The default setting for forward was forward . /etc/resolv.conf, but that didn't work for me. Instead, what worked was setting it to forward . <DNS IP> as suggested by @Nikola Malešević.Moke
I am running kubernetes on docker desktop, and I have the same issue, sometimes external domains are not getting resolved. However on my Corefile the forward is like this: forward . /etc/resolv.conf {max_concurrent 1000}. Also i cannot find the DNS ip address of kubernetes which runs on docker desktop. Any advices ?Pharisee
B
2

@Nikola Malešević's answer is great - it pointed me in the right direction and almost solved this problem for me. However, I unfortunately wasn't able to get this working with his answer, and in my research of this problem, I got a more complete picture of what you need to do for Ubuntu systems. Basically, there are a few different points I'd like to highlight:

Make sure you've got the right link for /etc/resolv.conf

TL;DR: Execute sudo ln -vfns /run/systemd/resolve/resolv.conf /etc/resolv.conf.

For whatever reason, my host's /etc/resolv.conf was linked weirdly. ls -l /etc/resolv.conf resulted in /etc/resolv.conf -> /run/systemd/resolve/stub-resolv.conf. cat /etc/resolv.conf thus resulted in:

# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
[... more options]

This nameserver was completely different from the result of resolvectl | grep "Current DNS Server": 10.1.0.2. By executing sudo ln -vfns /run/systemd/resolve/resolv.conf /etc/resolv.conf, I fixed this discrepancy.

Add 8.8.8.8 and 8.8.4.4 to your CoreDNS's ConfigMap's forward statement

I changed my forward statement in my CoreDNS's ConfigMap to this: forward . /etc/resolv.conf 8.8.8.8 8.8.4.4.

I actually wanted to modify /etc/resolv.conf such that 8.8.8.8 and 8.8.4.4 were included there, but my changes to the underlying /run/systemd/resolve/resolv.conf got wiped out after a few seconds, and I think this is because my host is an EC2 instance. Therefore, the next best solution I could find was forward . /etc/resolv.conf 8.8.8.8 8.8.4.4. I don't know if it's still possible to make configuration changes to the /etc/resolv.conf file somehow, but I didn't really bother, since the next best solution does what I need.

Restart your pod once you have made your changes

To me at least, the DNS changes didn't take effect for my pod until I recreated it.

Borreri answered 3/3, 2023 at 12:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.