Joining cluster takes forever

Asked 29/3, 2019 at 5:10 Answered 18/6, 2021 at 5:55

I have set up my master node and I am trying to join a worker node as follows:

kubeadm join 192.168.30.1:6443 --token 3czfua.os565d6l3ggpagw7 --discovery-token-ca-cert-hash sha256:3a94ce61080c71d319dbfe3ce69b555027bfe20f4dbe21a9779fd902421b1a63

However the command hangs forever in the following state:

[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

Since this is just a warning, why does it actually fails?

edit: I noticed the following in my /var/log/syslog

Mar 29 15:03:15 ubuntu-xenial kubelet[9626]: F0329 15:03:15.353432    9626 server.go:193] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Mar 29 15:03:15 ubuntu-xenial systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Mar 29 15:03:15 ubuntu-xenial systemd[1]: kubelet.service: Unit entered failed state.

Bilbo answered 29/3, 2019 at 5:10 Comment(0)

The problem had to do with kubeadm not installing a networking CNI-compatible solution out of the box;

Therefore, without this step the kubernetes nodes/master are unable to establish any form of communication;

The following task addressed the issue:

- name: kubernetes.yml --> Install Flannel
  shell: kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml
  become: yes
  environment:
    KUBECONFIG: "/etc/kubernetes/admin.conf"
  when: inventory_hostname in (groups['masters'] | last)

Bilbo answered 1/4, 2019 at 13:39 Comment(1)

when: with run_once better than last – Starlastarlene 12/5, 2020 at 2:40

First if you want to see more detail when your worker joins to the master use:

kubeadm join 192.168.1.100:6443 --token m3jfbb.wq5m3pt0qo5g3bt9     --discovery-token-ca-cert-hash sha256:d075e5cc111ffd1b97510df9c517c122f1c7edf86b62909446042cc348ef1e0b --v=2

Using the above command I could see that my worker could not established connection with the master, so i just stoped the firewall:

systemctl stop firewalld

Photodrama answered 8/9, 2019 at 2:8 Comment(0)

This can be solved by creating a new token using this command:

kubeadm token create --print-join-command

and use the token generated for joining other nodes to the cluster

Asha answered 18/6, 2021 at 5:55 Comment(0)

The problem had to do with kubeadm not installing a networking CNI-compatible solution out of the box;

Therefore, without this step the kubernetes nodes/master are unable to establish any form of communication;

The following task addressed the issue:

- name: kubernetes.yml --> Install Flannel
  shell: kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml
  become: yes
  environment:
    KUBECONFIG: "/etc/kubernetes/admin.conf"
  when: inventory_hostname in (groups['masters'] | last)

Bilbo answered 1/4, 2019 at 13:39 Comment(1)

when: with run_once better than last – Starlastarlene 12/5, 2020 at 2:40

I did get the same error on CentOS 7 but in my case join command worked without problems, so it was indeed just a warning.

>  [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker
> cgroup driver. The recommended driver is "systemd". Please follow the
> guide at https://kubernetes.io/docs/setup/cri/ [preflight] Reading
> configuration from the cluster... [preflight] FYI: You can look at
> this config file with 'kubectl -n kube-system get cm kubeadm-config
> -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace

As the official documentation mentions, there are two common issues that make the init hang (I guess it also applies to join command):

the default cgroup driver configuration for the kubelet differs from that used by Docker. Check the system log file (e.g. /var/log/message) or examine the output from journalctl -u kubelet. If you see something like the following:

First try the steps from official documentation and if that does not work please provide more information so we can troubleshoot further if needed.

Woermer answered 29/3, 2019 at 14:31 Comment(2)

It seems this issue is related to the CRI configuration on master; I have no problem creating the cluster; the join command is the one that fails; however I have installed docker using the exact same way in master and nodes. – Bilbo 29/3, 2019 at 15:0

I posted results of join command, so it works in my case even with this warning. Is this a clear cluster I mean can you just reset it? If yes try to do kubeadm reset and reinstall docker from official documentation site (have you also tried this)? – Woermer 29/3, 2019 at 16:1

I had a bunch of k8s deployment scripts that broke recently with this same error message... it looks like docker changed it's install. Try this --

previous install: apt-get isntall docker-ce

updated install: apt-get install docker-ce docker-ce-cli containerd.io

Voiced answered 2/4, 2019 at 18:31 Comment(0)

How /var/lib/kubelet/config.yaml is created?

Regarding the /var/lib/kubelet/config.yaml: no such file or directory error.

Below are steps that should occur on the worker node in order for the mentioned file to be created.

1 ) The creation of the /var/lib/kubelet/ folder.
It is created when the kubelet service is installed as mentioned here:

sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

2 ) The creation of config.yaml.
The kubeadm join flow should take place so when you run kubeadm join, kubeadm uses the Bootstrap Token credential to perform a TLS bootstrap, which fetches the credential needed to download the kubelet-config-1.X ConfigMap and writes it to /var/lib/kubelet/config.yaml.

After a successful execution you should see the logs below:

.
.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
.
.

So, after these 2 steps you should have /var/lib/kubelet/config.yaml in place.

Failure of the kubeadm join flow

In your case, it seems that the kubeadm join flow failed which might happen due to multiple reasons like bad configuration of iptables, ports that are already in use, container runtime not installed properly, etc' - as described here and here.

As far as I know, the fact that no networking CNI-compatible solution was in place should not affect the creation of /var/lib/kubelet/config.yaml:

A) We can see the under the kubeadm preflight checks what issues will cause the join phase to fail.

B ) I also tested this issue by removing the current solution I used (Calico) and ran kubeadm reset and kubeadm join again and no errors appeared in the kubeadm logs (I've got the successful execution logs I mentioned above) and /var/lib/kubelet/config.yaml was created properly.

(*) Of course that the cluster can't function in this state - I just wanted to emphasize that I think the problem was one of the options mentioned in A.

Assuming answered 29/9, 2020 at 21:16 Comment(0)

How /var/lib/kubelet/config.yaml is created?

Failure of the kubeadm join flow

Recommended topics

Hot tags