Kubernetes worker node is NotReady due to CNI plugin not initialized

Asked 12/8, 2020 at 0:7 Answered 4/7 at 9:18

I'm using kind to run a test kubernetes cluster on my local Macbook.

I found one of the nodes with status NotReady:

$ kind get clusters                                                                                                                                                                 
mc

$ kubernetes get nodes
NAME                STATUS     ROLES    AGE     VERSION
mc-control-plane    Ready      master   4h42m   v1.18.2
mc-control-plane2   Ready      master   4h41m   v1.18.2
mc-control-plane3   Ready      master   4h40m   v1.18.2
mc-worker           NotReady   <none>   4h40m   v1.18.2
mc-worker2          Ready      <none>   4h40m   v1.18.2
mc-worker3          Ready      <none>   4h40m   v1.18.2

The only interesting thing in kubectl describe node mc-worker is that the CNI plugin not initialized:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady
message:Network plugin returns error: cni plugin not initialized

I have 2 similar clusters and this only occurs on this cluster.

Since kind uses the local Docker daemon to run these nodes as containers, I have already tried to restart the container (should be the equivalent of rebooting the node).

I have considered deleting and recreating the cluster, but there ought to be a way to solve this without recreating the cluster.

Here are the versions that I'm running:

$ kind version                                                                                                                                                                     
kind v0.8.1 go1.14.4 darwin/amd64

$ kubectl version                                                                                                                                                  
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

How do you resolve this issue?

Turbid answered 12/8, 2020 at 0:7 Comment(0)

Most likely cause:

The docker VM is running out of some resource and cannot start CNI on that particular node.

You can poke around in the HyperKit VM by connecting to it:

From a shell:

screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty

If that doesn't work for some reason:

docker run -it --rm --privileged --pid=host alpine nsenter -t 1 -m -u -n -i sh

Once in the VM:

# ps -Af
# free
# df -h
...

Then you can always update the setting on the docker UI:

Finally, your node after all is running in a container. So you can connect to that container and see what kubelet errors you see:

docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                       NAMES
6d881be79f4a        kindest/node:v1.18.2   "/usr/local/bin/entr…"   32 seconds ago      Up 29 seconds       127.0.0.1:57316->6443/tcp   kind-control-plane
docker exec -it 6d881be79f4a bash
root@kind-control-plane:/# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/kind/systemd/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Wed 2020-08-12 02:32:16 UTC; 35s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 768 (kubelet)
    Tasks: 23 (limit: 2348)
   Memory: 32.8M
   CGroup: /docker/6d881be79f4a8ded3162ec6b5caa8805542ff9703fabf5d3d2eee204a0814e01/system.slice/kubelet.service
           └─768 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet
/config.yaml --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --fail-swap-on=false --node-ip= --fail-swap-on=false
...

✌️

Delacroix answered 12/8, 2020 at 2:19 Comment(6)

I think your right. I stopped all containers and started only this kind cluster and now everything comes back up okay. I tried starting another kind cluster and am having other symptoms when bringing it up. After digging into that issue it might also be related to lack of resources. Originally I was using 8GB/8CPU, but I'll try 16GB. – Turbid 12/8, 2020 at 19:46

I seem to having trouble even with 16GB. I'm getting "cannot connect to cluster: EOF" when using kubectl. Next, I'm trying to reduce and only use a single kind cluster (deleting all others). I also realized I have a docker kubernetes cluster running in the background which I will also stop. – Turbid 12/8, 2020 at 19:56

Well how much RAM does your machine have? – Delacroix 12/8, 2020 at 20:20

overall my Macbook has 32GB. I'm did try Docker daemon with 16GB when running 2 kind k8s clusters, but now I'm only a single kind k8s cluster and lowered it back to 8GB. It seems to run okay now as long as I only run a single kind k8s clusters – Turbid 13/8, 2020 at 4:2

good to hear, I guess "cannot connect to cluster: EOF is a different error. Maybe your kube-apiserver fail to start or some issue with Docker. – Delacroix 13/8, 2020 at 4:42

I got "err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.aliyuncs.com/google_containers/pause:3.7\" " from systemctl status kubelet, so the network problem also cause the worker status NotReady – Prerogative 7/9, 2022 at 15:58

Stop and disable apparmor & restart the containerd service on that node will solve your issue

root@node:~# systemctl stop apparmor
root@node:~# systemctl disable apparmor 
root@node:~# systemctl restart containerd.service

Starstudded answered 21/9, 2022 at 21:3 Comment(0)

I encountered this scenario. Master is Ready but the worker node's status are not. After some investigation, i found out that the /opt/cni/bin is empty - there is no network plugin for my worker node hosts. Thus, i installed this "kubernetes-cni.x86_64" and restarted kubelet service. This solved the "NotReady" status of my worker nodes.

Lh answered 23/10, 2020 at 10:14 Comment(0)

If you are using AWS EKS cluster and see the error "CNI plugin not initialized" or "nodes not joining the Kubernetes cluster", please make sure you have the correct add-ons installed.

Navigate to the EKS cluster in the AWS console
Go to the add-ons section.
Install the following add-ons: CoreDNS, kube-proxy, and VPC CNI. Make sure to set the "Conflict resolution method" to "Override".

This should solve the issue. I hope this saves someone a couple of hours in the future, especially since AWS is pushing Kubernetes version updates quickly.

Truant answered 4/7 at 9:18 Comment(0)

In my case, I added first node group to my new EKS cluster, and its status was failed. This message appeared in logs. None of the above solved my problem. I was using latest CNI addon. My problem was that I created role for node group. That role had AmazonEKSWorkerNodePolicy and AmazonEC2ContainerRegistryFullReadonlyAccess. But I forgot to add AmazonEKS_CNI_Policy. After adding this policy, my problem solved.

Kasey answered 5/12, 2023 at 23:32 Comment(1)

This was my problem. Thanks for the tip – Showman 26/1 at 0:54

i got the same problem , i found it dues to i have not installed cni network.

check cni network , we must have cni0

root@k8s-main:/home/zz/k8s#ifconfig cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.244.0.255 inet6 fe80::a85a:79ff:fe3f:a28e prefixlen 64 scopeid 0x20 ether aa:5a:79:3f:a2:8e txqueuelen 1000 (Ethernet) RX packets 1295 bytes 107173 (107.1 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1451 bytes 157776 (157.7 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
if you cannot see above one 2.1 download the cni network linux pkg from this link: https://github.com/containernetworking/plugins/releases/download/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz 2.2 unzip this pkg cni-plugins-linux-amd64-v0.8.6.tgz to this directory /opt/cni/bin 2.3 dwonload the flannel yaml example from internet https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml 2.4 kubectl apply -f flannel.yaml

then everything will run smooths

Serafinaserafine answered 16/5 at 7:11 Comment(0)

Recommended topics

Hot tags