NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from10.x.x.x - Error
Asked Answered
D

2

11

I get this error after I start the worker node VMs(Kubernetes) from the AWS console. I am using PKS ( Pivotal Container Service)

network for pod "xxxxx": NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from 10.x.x.x/xx

I supppose that Flannel assigns a subnet lease to the workers in the cluster, which expires after 24 hours - and flannel.1 and cni0 /24 subnet no longer match, which causes this issue.

I also know a workaround:

bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld" 
bosh ssh -d worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db" 
bosh ssh -d worker -c "sudo /var/vcap/bosh/bin/monit restart all"

However is there any permanent fix to this?

Dashtilut answered 22/4, 2020 at 19:17 Comment(3)
Hi, it seems like You need to reinitialize the pod network every time the node starts up. Maybe the --pod-network-cidr value is incorrect. Are these worker nodes the same versions as master nodes? Can You share k8s versions of nodes and flannel cni version?Outoftheway
Yup, they are the same version,Dashtilut
I am using K8s Version 1.15.5,i see the cni0 and flannel have ips that are not from the same subnet My pod network cidr is 10.200.0.0/16 On one of the worker nodes, ifconfig ouptut is like this: cni0 Link encap:Ethernet inet addr:10.200.28.1 Bcast:10.200.28.255 Mask:255.255.255.0 flannel.1 Link encap:Ethernet HWaddr inet addr:10.200.42.0 Bcast:0.0.0.0 Mask:255.255.255.255 Subnet.env has this: FLANNEL_NETWORK=10.200.0.0/16 FLANNEL_SUBNET=10.200.42.1/24 FLANNEL_MTU=8951 FLANNEL_IPMASQ=trueDashtilut
M
27

TL;DR - recreate network

$ ip link set cni0 down
$ brctl delbr cni0  

Or, as @ws_ suggested in comments - remove interfaces and restart k8s services:

ip link set cni0 down && ip link set flannel.1 down 
ip link delete cni0 && ip link delete flannel.1
systemctl restart containerd && systemctl restart kubelet

Community solutions

It is a known issue

And there are some solutions to fix it.

Solution by filipenv is:

on master and slaves:

$ kubeadm reset
$ systemctl stop kubelet
$ systemctl stop docker
$ rm -rf /var/lib/cni/
$ rm -rf /var/lib/kubelet/*
$ rm -rf /etc/cni/
$ ifconfig cni0 down
$ ifconfig flannel.1 down
$ ifconfig docker0 down

you may need to manually umount filesystems from /var/lib/kubelet before calling rm on that dir) After doing that I started docker and kubelet back again and restarted the kubeadm process

aysark: and kubernetes-handbook in a recipe for Pod stuck in Waiting or ContainerCreating both recommend

$ ip link set cni0 down
$ brctl delbr cni0  

Some workarounds from Flannel's KB article

And there is an article in Flannel's KB: PKS Flannel network gets out of sync with docker bridge network (cni0)

Workaround 1:

WA1 is just like yours:

    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
    bosh ssh -d <deployment_name> worker -c "sudo rm /var/vcap/store/docker/docker/network/files/local-kv.db"
    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart all"

Workaround 2:

If WA1 didn't help, KB recommends:

    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit stop flanneld"
    bosh ssh -d <> worker -c "ifconfig | grep -A 1 flannel"
    On a master node get access to etcd using the following KB 
    On a master node run `etcdctlv2 ls /coreos.com/network/subnets/`
    Remove all the worker subnet leases from etcd by running `etcdctlv2 rm /coreos.com/network/subnets/<worker_subnet>;` for each of the worker subnets from point 2 above.
    bosh ssh -d <deployment_name> worker -c "sudo /var/vcap/bosh/bin/monit restart flanneld"
Marine answered 1/5, 2020 at 14:14 Comment(5)
this works for me (k8s 1.21 with containerd): 1. ip link set cni0 down && ip link set flannel.1 down 2. ip link delete cni0 && ip link delete flannel.1 3. systemctl restart containerd && systemctl restart kubeletMalina
Your comment worked for me as well. Thanks @MalinaHumane
You saved me dude. This is working perfectly in 1.23.xUndone
It work also for me but i have to remove flannels pod running on this nodeShapeless
I used this in 1.29 and removing those interfaces worked for me. Didn't need to restart crio or kubelet.Gosnell
P
2

I am running docker with kubernetes. Did the following on all my master and slave nodes and got my cluster work:

sudo su
ip link set cni0 down && ip link set flannel.1 down 
ip link delete cni0 && ip link delete flannel.1
systemctl restart docker && systemctl restart kubelet
Prolocutor answered 23/4, 2022 at 15:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.