kubelet won't start after kuberntes/manifest update
Asked Answered
O

1

11

This is sort of strange behavior in our K8 cluster.

When we try to deploy a new version of our applications we get:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "<container-id>" network for pod "application-6647b7cbdb-4tp2v": networkPlugin cni failed to set up pod "application-6647b7cbdb-4tp2v_default" network: Get "https://[10.233.0.1]:443/api/v1/namespaces/default": dial tcp 10.233.0.1:443: connect: connection refused

I used kubectl get cs and found controller and scheduler in Unhealthy state.

As describer here updated /etc/kubernetes/manifests/kube-scheduler.yaml and /etc/kubernetes/manifests/kube-controller-manager.yaml by commenting --port=0

When I checked systemctl status kubelet it was working.

Active: active (running) since Mon 2020-10-26 13:18:46 +0530; 1 years 0 months ago

I had restarted kubelet service and controller and scheduler were shown healthy.

But systemctl status kubelet shows (soon after restart kubelet it showed running state)

Active: activating (auto-restart) (Result: exit-code) since Thu 2021-11-11 10:50:49 +0530; 3s ago<br>
    Docs: https://github.com/GoogleCloudPlatform/kubernetes<br>  Process: 21234 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET

Tried adding Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false" to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf as described here, but still its not working properly.

Also removed --port=0 comment in above mentioned manifests and tried restarting,still same result.

Edit: This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf

Many other suggested kubeadm init again. But this cluster was created using kubespray no manually added nodes.

We have baremetal k8 running on Ubuntu 18.04. K8: v1.18.8

We would like to know any debugging and fixing suggestions.

PS:
When we try to telnet 10.233.0.1 443 from any node, first attempt fails and second attempt success.

Edit: Found this in kubelet service logs

Nov 10 17:35:05 node1 kubelet[1951]: W1110 17:35:05.380982    1951 docker_sandbox.go:402] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "app-7b54557dd4-bzjd9_default": unexpected command output nsenter: cannot open /proc/12311/ns/net: No such file or directory
Optative answered 11/11, 2021 at 5:46 Comment(5)
I'm having a similar problem after updating to k8s v1.22.2 from v1.21 :/Garrity
I noticed that you are setting the flag --fail-swap-on=false, but can you give a try with running the command sudo swapoff -a and do restart kubelet. If it didn't work then try by changing the cgroup to systemd in docker. To do that edit the /etc/docker/daemon.json file to '{ "exec-opts": ["native.cgroupdriver=systemd"] }'. And restart docker with sudo systemctl restart docker.For more info refer to the doc.Amphichroic
@ChandraKiranPasumarti Well still kubelet is not starting, even after docker restartOptative
Can you check if zombie process are running on the nodes? Just ps aux | grep 'Z'Argumentation
@Argumentation No Zombie process are shownOptative
P
2

Posting comment as the community wiki answer for better visibility


This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf

Parenteau answered 11/11, 2021 at 5:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.