CoreDNS fails to run in Kubernetes cluster
Asked Answered
C

6

22

I'm trying to setup a Kubernetes cluster, but I cannot get CoreDNS running. I've ran the following to start the cluster:

sudo swapoff -a
sudo sysctl net.bridge.bridge-nf-call-iptables=1
sudo kubeadm init

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s- version=$(kubectl version | base64 | tr -d '\n')"
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

To check the PODs with kubectl get pods --all-namespaces, I get

NAMESPACE     NAME                                    READY   STATUS             RESTARTS   AGE
kube-system   coredns-68fb79bcf6-6s5bp                0/1     CrashLoopBackOff   6          10m
kube-system   coredns-68fb79bcf6-hckxq                0/1     CrashLoopBackOff   6          10m
kube-system   etcd-myserver                           1/1     Running            0          79m
kube-system   kube-apiserver-myserver                 1/1     Running            0          79m
kube-system   kube-controller-manager-myserver        1/1     Running            0          79m
kube-system   kube-proxy-9ls64                        1/1     Running            0          80m
kube-system   kube-scheduler-myserver                 1/1     Running            0          79m
kube-system   kubernetes-dashboard-77fd78f978-tqt8m   1/1     Running            0          80m
kube-system   weave-net-zmhwg                         2/2     Running            0          80m

So CoreDNS keeps crashing. The only error messages I could found were from /var/log/syslog:

Oct  4 18:06:44 myserver kubelet[16397]: E1004 18:06:44.961409   16397 pod_workers.go:186] Error syncing pod c456a48b-c7c3-11e8-bf23-02426706c77f ("coredns-68fb79bcf6-6s5bp_kube-system(c456a48b-c7c3-11e8-bf23-02426706c77f)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=coredns pod=coredns-68fb79bcf6-6s5bp_kube-system(c456a48b-c7c3-11e8-bf23-02426706c77f)"

and from kubectl logs coredns-68fb79bcf6-6s5bp -n kube-system:

.:53
2018/10/04 11:04:55 [INFO] CoreDNS-1.2.2
2018/10/04 11:04:55 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/10/04 11:04:55 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/10/04 11:04:55 [FATAL] plugin/loop: Seen "HINFO IN 3256902131464476443.1309143030470211725." more than twice, loop detected

Some solutions I found are to issue

kubectl -n kube-system get deployment coredns -o yaml | \
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
kubectl apply -f -

and to modify /etc/resolv.conf to point to an actual DNS, not to localhost, which I tried as well.

The issue is described in https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#pods-in-runcontainererror-crashloopbackoff-or-error-state and I tried many different Pod Networks but no help.

I've run sudo kubeadm reset && rm -rf ~/.kube/ && sudo kubeadm init several times.

I'm running Ubuntu 16.04, Kubernetes 1.12 and Docker 17.03. Any ideas?

Cabbageworm answered 4/10, 2018 at 11:23 Comment(3)
Turns out it was the problem from creating a loop when the nameserver in resolv.conf pointed to localhost and systemd-resolved was running. So I stopped the service and manually added nameservers through Ubuntu's network settings. Instead of this, I wanted to specify the location of resolv.conf for CoreDNS, but I could not figure out how to do this. I tried to specify the location through a config file with kubeadm init --config config.yaml but kubelet/CoreDNS didn't pick up the file. I'd be still interested how to specify alternative location of resolv.conf to kubeadm.Cabbageworm
Just a follow up, for those who are still having this error. coredns:1.2.6 does not have this problem, without any additional config. So, the easist way to fix it, is to upgrade coredns to 1.2.6 using this command: kubectl patch deployment -n=kube-system coredns -p '{"spec": {"template": {"spec":{"containers":[{"image":"k8s.gcr.io/coredns:1.2.6", "name":"coredns","resources":{"limits":{"memory":"1Gi"},"requests":{"cpu":"100m","memory":"70Mi"}}}]}}}}'Crabbing
@RichardLi Doesn't work for meElery
G
26

I also have the same issue.

I've solved the problem by deleting the plugins 'loop' within the cm of coredns. but i don't know if this cloud case other porblems.

1、kubectl edit cm coredns -n kube-system

2、delete ‘loop’ ,save and exit

3、restart coredns pods by:kubectl delete pod coredns.... -n kube-system

Gestalt answered 9/10, 2018 at 13:41 Comment(4)
This worked for me, but do we know why it works? Are we loosing anything by removing it?Wichman
From kubernetes.io/docs/tasks/administer-cluster/… :Some Linux distributions (e.g. Ubuntu), use a local DNS resolver by default (systemd-resolved). Systemd-resolved moves and replaces /etc/resolv.conf with a stub file that can cause a fatal forwarding loop when resolving names in upstream servers. This can be fixed manually by using kubelet’s --resolv-conf flag to point to the correct resolv.conf (With systemd-resolved, this is /run/systemd/resolve/resolv.conf)Incrustation
Use KUBE_EDITOR="nano" before kubectl edit .... to edit within nano instead of vimMetrify
I tried this and CPU utilization from coredns was very high. Removing loop means coredns will have no way to detect dns resolution loops. Removing loop isn't recommended.Hutchens
C
10

Somewhat resolved by using this way:

  1. open and edit the configmap of coredns.

    kubectl edit cm coredns -n kube-system

  2. "replacing proxy . /etc/resolv.conf with the ip address of your upstream DNS, for example proxy . 8.8.8.8." According to the link in the output of the coredns log (at the end of the page)
  3. save and exit.
  4. kubectl get pods -n kube-system -oname |grep coredns |xargs kubectl delete -n kube-system

The reason of the problem is explained in the link. You can find this link in the output of this cmd

kubectl logs coredns-7d9cd4f75b-cpwxp -n kube-system

This link is in the output of the CoreDNS-1.2.4.

I upgrade the CoreDNS using this cmd

kubectl patch deployment -n=kube-system coredns -p '{"spec": {"template": {"spec":{"containers":[{"image":"k8s.gcr.io/coredns:1.2.4", "name":"coredns","resources":{"limits":{"memory":"1Gi"},"requests":{"cpu":"100m","memory":"70Mi"}}}]}}}}'

Crabbing answered 21/10, 2018 at 2:57 Comment(1)
This fix worked for me. Though proxy . /etc/resolv.conf was absent on my config map. I added forward . 8.8.8.8 and it worked like a charm!Bartolome
B
2

I think simply delete loop function from Kubernetes is not a clean approach. CoreDNS Github actually provides some guidelines for trouble shooting this issue.

They suggest in their guideline about 3 approaches

  • Add the following to kubelet: --resolv-conf . Your "real" resolv.conf is the one that contains the actual IPs of your upstream servers, and no local/loopback address. This flag tells kubelet to pass an alternate resolv.conf to Pods. For systems using systemd-resolved, /run/systemd/resolve/resolv.conf is typically the location of the "real" resolv.conf, although this can be different depending on your distribution.
  • Disable the local DNS cache on host nodes, and restore /etc/resolv.conf to the original.
  • A quick and dirty fix is to edit your Corefile, replacing proxy . /etc/resolv.conf with the ip address of your upstream DNS, for example proxy . 8.8.8.8. But this only fixes the issue for CoreDNS, kubelet will continue to forward the invalid resolv.conf to all default dnsPolicy Pods, leaving them unable to resolve DNS.
Barron answered 30/1, 2019 at 21:19 Comment(1)
As of the above workaround to cleanly update /etc/resolv.conf on Ubuntu 18.04 LTS, I deleted coredns pod. Then fixed this issue of CrashLoopBackOff kubectl delete pod -n=kube-system coredns-fb8b8dccf-8ggcfMonsour
K
2

My solution was to delete --network-plugin=cni in /var/lib/kubelet/kubeadm-flags.env

Kimkimball answered 24/9, 2019 at 9:31 Comment(1)
after removing the --network-plugin=cni from /var/lib/kubelet/kubeadmflags.env still shows runing pods.Deutschland
X
1

Yes, you are right. The issue was described here and on GitHub. And the resolution is to upgrade Docker, disable SElinux or modify allowPrivilegeEscalation to true. However, today I have tried to reproduce your issue and wasn't able to do this. Providing you commands and output, maybe it will help you create a working version from the beginning.

Docker version 17.03.2-ce, Kubernetes v1.12.0, Ubuntu 16.04, CoreDNS-1.2.2, instance created in GCP.

#apt-get update && apt-get install -y mc ebtables ethtool docker.io apt-transport-https curl
#curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

#cat <<EOF >/etc/apt/sources.list.d/kubernetes.list \
deb http://apt.kubernetes.io/ kubernetes-xenial main \
EOF

#apt-get update && apt-get install -y kubelet kubeadm kubectl

#kubeadm init
$mkdir -p $HOME/.kube
$sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$sudo chown $(id -u):$(id -g) $HOME/.kube/config
$kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

serviceaccount/weave-net created

clusterrole.rbac.authorization.k8s.io/weave-net created

clusterrolebinding.rbac.authorization.k8s.io/weave-net created

role.rbac.authorization.k8s.io/weave-net created

rolebinding.rbac.authorization.k8s.io/weave-net created

daemonset.extensions/weave-net created

$kubectl get pods --all-namespaces
NAMESPACE     NAME                                              READY   STATUS              RESTARTS   AGE
kube-system   pod/coredns-576cbf47c7-6qbtq                      0/1     Pending             0          79s
kube-system   pod/coredns-576cbf47c7-jr6hb                      0/1     Pending             0          79s
kube-system   pod/etcd-kube-weave-master-1                      1/1     Running             0          38s
kube-system   pod/kube-apiserver-kube-weave-master-1            1/1     Running             0          28s
kube-system   pod/kube-controller-manager-kube-weave-master-1   1/1     Running             0          30s
kube-system   pod/kube-proxy-4p9l5                              1/1     Running             0          79s
kube-system   pod/kube-scheduler-kube-weave-master-1            1/1     Running             0          34s
kube-system   pod/weave-net-z6mhw                               0/2     ContainerCreating   0          8s

And again in a minute:

$kubectl get pods --all-namespaces
NAMESPACE     NAME                                              READY   STATUS    RESTARTS   AGE
kube-system   pod/coredns-576cbf47c7-6qbtq                      1/1     Running   0          98s
kube-system   pod/coredns-576cbf47c7-jr6hb                      1/1     Running   0          98s
kube-system   pod/etcd-kube-weave-master-1                      1/1     Running   0          57s
kube-system   pod/kube-apiserver-kube-weave-master-1            1/1     Running   0          47s
kube-system   pod/kube-controller-manager-kube-weave-master-1   1/1     Running   0          49s
kube-system   pod/kube-proxy-4p9l5                              1/1     Running   0          98s
kube-system   pod/kube-scheduler-kube-weave-master-1            1/1     Running   0          53s
kube-system   pod/weave-net-z6mhw                               2/2     Running   0          27s

Coredns pod description:

kubectl describe pod/coredns-576cbf47c7-6qbtq -n kube-system
Name:               coredns-576cbf47c7-6qbtq
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               kube-weave-master-1/10.154.0.8
Start Time:         Fri, 05 Oct 2018 11:06:54 +0000
Labels:             k8s-app=kube-dns
                    pod-template-hash=576cbf47c7
Annotations:        <none>
Status:             Running
IP:                 10.32.0.3
Controlled By:      ReplicaSet/coredns-576cbf47c7
Containers:
  coredns:
    Container ID:  docker://db1712600b4c927b99063fa41bc36c3346c55572bd63730fc993f03379fa457b
    Image:         k8s.gcr.io/coredns:1.2.2
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Fri, 05 Oct 2018 11:06:57 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-wp7tm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-wp7tm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-wp7tm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                          Message
  ----     ------            ----                ----                          -------
  Warning  FailedScheduling  23m (x12 over 24m)  default-scheduler             0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         23m                 default-scheduler             Successfully assigned kube-system/coredns-576cbf47c7-6qbtq to kube-weave-master-1
  Normal   Pulled            23m                 kubelet, kube-weave-master-1  Container image "k8s.gcr.io/coredns:1.2.2" already present on machine
  Normal   Created           23m                 kubelet, kube-weave-master-1  Created container
  Normal   Started           23m                 kubelet, kube-weave-master-1  Started container

Also, provide your config.yaml that is used by kubeadm init --config config.yaml to better understand your problem with specifying config file location.

Xavler answered 5/10, 2018 at 12:9 Comment(1)
I dumped the configuration from master but feeding the full configuration back in init caused an error so that's another problem. Anyway, I stripped the config to minimal and this is what is left (| for new line): apiVersion: kubeadm.k8s.io/v1alpha3|kind: InitConfiguration|---|apiVersion: kubelet.config.k8s.io/v1beta1|kind: KubeletConfiguration|resolvConf: /path_to_my_config/resolv.conf However, kubeadm doesn't pick up the resolvConf variable and --resolv-conf flag in /var/lib/kubelet/kubeadm-flags.env still points to /etc/run/resolvconf/resolv.conf.Cabbageworm
B
0

My solution was to delete --network-plugin=cni in /var/lib/kubelet/kubeadmflags.env then restart the machine, the COredns will be running. Good luck

Businesswoman answered 28/1, 2020 at 4:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.