Kubernetes dial tcp myIP:10250: connect: no route to host
Asked Answered
B

2

5

I got Kubernetes Cluster with 1 master and 3 workers nodes.

calico v3.7.3 kubernetes v1.16.0 installed via kubespray https://github.com/kubernetes-sigs/kubespray

Before that, I normally deployed all the pods without any problems.

I can't start a few pod (Ceph):

kubectl get all --namespace=ceph
NAME                                 READY   STATUS             RESTARTS   AGE
pod/ceph-cephfs-test                 0/1     Pending            0          162m
pod/ceph-mds-665d849f4f-fzzwb        0/1     Pending            0          162m
pod/ceph-mon-744f6dc9d6-jtbgk        0/1     CrashLoopBackOff   24         162m
pod/ceph-mon-744f6dc9d6-mqwgb        0/1     CrashLoopBackOff   24         162m
pod/ceph-mon-744f6dc9d6-zthpv        0/1     CrashLoopBackOff   24         162m
pod/ceph-mon-check-6f474c97f-gjr9f   1/1     Running            0          162m


NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
service/ceph-mon   ClusterIP   None         <none>        6789/TCP   162m

NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR       AGE
daemonset.apps/ceph-osd   0         0         0       0            0           node-type=storage   162m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ceph-mds         0/1     1            0           162m
deployment.apps/ceph-mon         0/3     3            0           162m
deployment.apps/ceph-mon-check   1/1     1            1           162m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/ceph-mds-665d849f4f        1         1         0       162m
replicaset.apps/ceph-mon-744f6dc9d6        3         3         0       162m
replicaset.apps/ceph-mon-check-6f474c97f   1         1         1       162m

But another obe is ok:

kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-6d57b44787-xlj89   1/1     Running   19         24d
calico-node-dwm47                          1/1     Running   310        19d
calico-node-hhgzk                          1/1     Running   15         24d
calico-node-tk4mp                          1/1     Running   309        19d
calico-node-w7zvs                          1/1     Running   312        19d
coredns-74c9d4d795-jrxjn                   1/1     Running   0          2d23h
coredns-74c9d4d795-psf2v                   1/1     Running   2          18d
dns-autoscaler-7d95989447-7kqsn            1/1     Running   10         24d
kube-apiserver-master                      1/1     Running   4          24d
kube-controller-manager-master             1/1     Running   3          24d
kube-proxy-9bt8m                           1/1     Running   2          19d
kube-proxy-cbrcl                           1/1     Running   4          19d
kube-proxy-stj5g                           1/1     Running   0          19d
kube-proxy-zql86                           1/1     Running   0          19d
kube-scheduler-master                      1/1     Running   3          24d
kubernetes-dashboard-7c547b4c64-6skc7      1/1     Running   591        24d
nginx-proxy-worker1                        1/1     Running   2          19d
nginx-proxy-worker2                        1/1     Running   0          19d
nginx-proxy-worker3                        1/1     Running   0          19d
nodelocaldns-6t92x                         1/1     Running   2          19d
nodelocaldns-kgm4t                         1/1     Running   0          19d
nodelocaldns-xl8zg                         1/1     Running   0          19d
nodelocaldns-xwlwk                         1/1     Running   12         24d
tiller-deploy-8557598fbc-7f2w6             1/1     Running   0          131m

I use Centos 7:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

The error log:

Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000&timestamps=true: dial tcp 10.2.67.203:10250: connect: no route to host

Maybe someone came across this and can help me? I will provide any additional information

logs from pending pods:

Warning FailedScheduling 98s (x125 over 3h1m) default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.

Billie answered 7/10, 2019 at 10:58 Comment(12)
I would pay more attention to pending pods. Probably that's why the other ones are crushing.Imbrication
Warning FailedScheduling 5m10s (x119 over 3h) default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.Billie
that means you have a node-selector in your yaml file, but none of your nodes is labeled with that selector, so the scheduled can't schedule the pod on any node. Get the yaml file, get the node-selector and do kubectl label node NODE key=value. key=value is your node-selector.Imbrication
I did it: kubectl label nodes node-type=storage --all , and now all pods failed: Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "e34272b14a996518cec3895830981fc775a930a95719c4f7b1dc4e6a6ce42f2d" network for pod "ceph-mon-744f6dc9d6-5jjr2": NetworkPlugin cni failed to set up pod "ceph-mon-744f6dc9d6-5jjr2_ceph" network: dial tcp 10.2.67.201:2379: connect: no route to host, failed to clean up sandbox container "Billie
Sounds like pod network error. I don't know why. 2379 is etcd, but I can't relate it with the error. if you check the logs of calico, everything seems fine? firewalls are correct?Imbrication
2019-10-07 13:41:10 /opt/ceph-container/bin/entrypoint.sh: k8s: config is stored as k8s secrets. 2019-10-07 13:41:10 /opt/ceph-container/bin/entrypoint.sh: k8s: does not generate the admin key. Use Kubernetes secrets instead. 2019-10-07 13:41:10 /opt/ceph-container/bin/entrypoint.sh: Creating osd unable to get monitor info from DNS SRV with service name: ceph-mon [errno 2] error connecting to the clusterBillie
I would say this is a ceph specific issue. I added ceph tag.Imbrication
Can you add informations about your kubernetes and calico versions please?Lutenist
calico v3.7.3, kubernetes 1.16 installed from kubespray github.com/kubernetes-sigs/kubesprayBillie
I updated my kernel, but the same doesnt work Linux master 5.3.6-1.el7.elrepo.x86_64 #1 SMP Fri Oct 11 17:24:39 EDT 2019 x86_64 x86_64 x86_64 GNU/LinuxBillie
Blindly labelling nodes as node-type: storage doesn't fix the underlying problem. The error you see is unrelated to Ceph or to the node labels, but specifically states that your CNI provider (Calico) was unable to set up the Pod's network. Since your calico-node pods, which are responsible for configuring the pod's network, are crash-looping it is likely that they are the underlying cause of the issues you are seeing. How did you deploy Ceph?Jennie
Did you solve the issue ? If you did, what exactly did you do ?Firstborn
C
12

It seems that a firewall is blocking ingress traffic from port 10250 on the 10.2.67.203 node.

You can open it by running the commands below (I'm assuming firewalld is installed or you can run the commands of the equivalent firewall module):

sudo firewall-cmd --add-port=10250/tcp --permanent
sudo firewall-cmd --reload
sudo firewall-cmd --list-all  # you should see that port `10250` is updated
Canonize answered 30/9, 2020 at 21:15 Comment(0)
J
4

tl;dr; It looks like your cluster itself is fairly broken and should be repaired before looking at Ceph specifically

Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000&timestamps=true: dial tcp 10.2.67.203:10250: connect: no route to host

10250 is the port that the Kubernetes API server uses to connect to a node's Kubelet to retrieve the logs.

This error indicates that the Kubernetes API server is unable to reach the node. This has nothing to do with your containers, pods or even your CNI network. no route to host indicates that either:

  1. The host is unavailable
  2. A network segmentation has occurred
  3. The Kubelet is unable to answer the API server

Before addressing issues with the Ceph pods I would investigate why the Kubelet isn't reachable from the API server.

After you have solved the underlying network connectivity issues I would address the crash-looping Calico pods (You can see the logs of the previously executed containers by running kubectl logs -n kube-system calico-node-dwm47 -p).

Once you have both the underlying network and the pod network sorted I would address the issues with the Kubernetes Dashboard crash-looping, and finally, start to investigate why you are having issues deploying Ceph.

Jennie answered 8/11, 2019 at 17:36 Comment(7)
kubectl get pod -ns kube-system calico-node-dwm47 -p calico-node-dwm47 1/1 Running 311 60d and for a long time there were no logs with errors in api logs. But it was `` http: TLS handshake error from 10.2.67.26:49312: remote error: tls: bad certificate\n``Billie
and i get curl -X GET https://10.2.67.201:6443/healthz -k okBillie
and this port curl -v -i 10.2.67.201:10250 * About to connect() to 10.2.67.201 port 10250 (#0) * Trying 10.2.67.201... * Connected to 10.2.67.201 (10.2.67.201) port 10250 (#0) > GET / HTTP/1.1Billie
My apologies, I gave you the wrong command, you should use kubectl logs -p, not kubectl get podsJennie
In addition to checking the kube-apiserver logs you should also check the kubelet logs on an affected hostJennie
only the one WARN entry - 2019-11-18 08:42:03.354 [WARNING][53] active_rules_calculator.go 326: Profile not known or invalid, generating dummy profile that drops all traffic. profileID="ksa.ceph.default"Billie
Well, dropping all traffic means that it won't accept packets, so your pod network won't transit any network traffic if I understand the log entry. Not sure where the log came from, but I would assume it is from the calico-node pods?Jennie

© 2022 - 2024 — McMap. All rights reserved.