Kubernetes' container creation with flannel gets stuck in "ContainerCreating"-state

S

2

1

Context

I installed Docker following this instruction on my Ubuntu 18.04 LTS (Server) and later on Kubernetes followed via kubeadm. After initializing (kubeadm init --pod-network-cidr=10.10.10.10/24) and joining a second node (I got a two node cluster for the start) I cannot get my coredns as well as the later applied Web UI (Dashboard) to actually go into status Running.

As pod network I tried both, Flannel (kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml) and Weave Net - Nothing changed. It still shows status ContainerCreating, even after hours of waiting:

Question

Why doesn't the container creation work as expected and what might be the root cause for this? And most importantly: How do I solve this?

Edit

Summing up my answer below, here are the reasons why:

Docker used cgroups instead of systemd
I did not configure iptables correctly
I used a wrong kubeadm init since flannels standard-yaml requires --pod-network-cidr to be 10.244.0.0/16

Seismography answered 4/2, 2020 at 0:38 Comment(0)

S

5

Since answering this questions took me a lot of time, I wanted to share what got me out of this. There might be some more code than necessary, but I also want this to be in one place if I or someone else has to redo all steps.

First it all started with Docker...

I figured out that it presumably all started with the way I installed Docker. Following the linked online-instructions I used sudo apt-get install docker.io in order to install Docker and used it with cgroups by doing sudo usermod -aG docker $USER.

Well, taking a look at the official instructions from Kubernetes this was a mistake: systemd is the recommended way to go!

So I completly purged all I ever did with docker by following these great instructions from Mayur Bhandare:

sudo apt-get purge -y docker-engine docker docker.io docker-ce  
sudo apt-get autoremove -y --purge docker-engine docker docker.io docker-ce  
sudo rm -rf /var/lib/docker /etc/docker
sudo rm /etc/apparmor.d/docker
sudo groupdel docker
sudo rm -rf /var/run/docker.sock

# Reboot to be sure

Afterwards I installed reinstalled the official way (keep in mind that this might change in the future):

# Install Docker CE
## Set up the repository:
### Install packages to allow apt to use a repository over HTTPS
apt-get update && apt-get install -y \
  apt-transport-https ca-certificates curl software-properties-common gnupg2

### Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

### Add Docker apt repository.
add-apt-repository \
  "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) \
  stable"

## Install Docker CE.
apt-get update && apt-get install -y \
  containerd.io=1.2.10-3 \
  docker-ce=5:19.03.4~3-0~ubuntu-$(lsb_release -cs) \
  docker-ce-cli=5:19.03.4~3-0~ubuntu-$(lsb_release -cs)

# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

# Restart docker.
systemctl daemon-reload
systemctl restart docker

Note that this explicitly uses systemd!

... and then it went on with Flannel...

Above I wrote my sudo kubeadm init was done with --pod-network-cidr=10.10.10.10/24 since the latter was the IP of my master. Well, as pointed out here not using the official recommended --pod-network-cidr=10.244.0.0/16 results in an error for example using kubectl proxy or the container-creation when using the provided kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml. This is due to the fact that 10.244.0.0/16 is hard-linked in the .yaml and, hence, mandatory - Or you just change it in the .yaml.

In order to get rid of the false configuration I did a full reset. This can be achieved using sudo kubeadm reset and by deleting the config with sudo rm -r ~/.kube/config. Anyhow, since I screwed it so much, I did a full reset by uninstalling and reinstalling kubeadm and making sure it did use iptables this time (which I also forgot to do before...).

Here is a nice link how to fully uninstall all kubeadm-parts.

kubeadm reset
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*   
sudo apt-get autoremove  
sudo rm -rf ~/.kube

For the sake of completeness, here is the reinstall as well:

# ensure legacy binaries are installed
sudo apt-get install -y iptables arptables ebtables

# switch to legacy versions
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy

# Install Kubernetes with kubeadm
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

#reboot

... and finally it worked!

After the clean reinstallation I did the following:

# Initialize with correct cidr
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml

And then be astouned by the result:

kubectl get pods --all-namespaces

On a site note: This also resolved the /run/flannel/subnet.env: no such file or directory-error I encountered prior to these steps when describing the uncreated coredns.

Seismography answered 4/2, 2020 at 0:38 Comment(0)

D

3

So I had the same issue as stated above. For me, this was the perfect solution to fix this, but also other pods were stuck on either pending or ContainerCreating. In addition as the fix above, my flannel encountered an unnoticed error, so I needed to rerun the flannel create.

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Dignadignified answered 26/8, 2020 at 5:38 Comment(0)

Recommended topics

Hot tags