I created a K8s cluster of 5 VMs (1 master and 4 slaves running Ubuntu 16.04.3 LTS) using kubeadm
. I used flannel
to set up networking in the cluster. I was able to successfully deploy an application. I, then, exposed it via NodePort service. From here things got complicated for me.
Before I started, I disabled the default firewalld
service on master and the nodes.
As I understand from the K8s Services doc, the type NodePort exposes the service on all nodes in the cluster. However, when I created it, the service was exposed only on 2 nodes out of 4 in the cluster. I am guessing that's not the expected behavior (right?)
For troubleshooting, here are some resource specs:
root@vm-vivekse-003:~# kubectl get nodes
NAME STATUS AGE VERSION
vm-deepejai-00b Ready 5m v1.7.3
vm-plashkar-006 Ready 4d v1.7.3
vm-rosnthom-00f Ready 4d v1.7.3
vm-vivekse-003 Ready 4d v1.7.3 //the master
vm-vivekse-004 Ready 16h v1.7.3
root@vm-vivekse-003:~# kubectl get pods -o wide -n playground
NAME READY STATUS RESTARTS AGE IP NODE
kubernetes-bootcamp-2457653786-9qk80 1/1 Running 0 2d 10.244.3.6 vm-rosnthom-00f
springboot-helloworld-2842952983-rw0gc 1/1 Running 0 1d 10.244.3.7 vm-rosnthom-00f
root@vm-vivekse-003:~# kubectl get svc -o wide -n playground
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
sb-hw-svc 10.101.180.19 <nodes> 9000:30847/TCP 5h run=springboot-helloworld
root@vm-vivekse-003:~# kubectl describe svc sb-hw-svc -n playground
Name: sb-hw-svc
Namespace: playground
Labels: <none>
Annotations: <none>
Selector: run=springboot-helloworld
Type: NodePort
IP: 10.101.180.19
Port: <unset> 9000/TCP
NodePort: <unset> 30847/TCP
Endpoints: 10.244.3.7:9000
Session Affinity: None
Events: <none>
root@vm-vivekse-003:~# kubectl get endpoints sb-hw-svc -n playground -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: 2017-08-09T06:28:06Z
name: sb-hw-svc
namespace: playground
resourceVersion: "588958"
selfLink: /api/v1/namespaces/playground/endpoints/sb-hw-svc
uid: e76d9cc1-7ccb-11e7-bc6a-fa163efaba6b
subsets:
- addresses:
- ip: 10.244.3.7
nodeName: vm-rosnthom-00f
targetRef:
kind: Pod
name: springboot-helloworld-2842952983-rw0gc
namespace: playground
resourceVersion: "473859"
uid: 16d9db68-7c1a-11e7-bc6a-fa163efaba6b
ports:
- port: 9000
protocol: TCP
After some tinkering I realized that on those 2 "faulty" nodes, those services were not available from within those hosts itself.
Node01 (working):
root@vm-vivekse-004:~# curl 127.0.0.1:30847 //<localhost>:<nodeport>
Hello Docker World!!
root@vm-vivekse-004:~# curl 10.101.180.19:9000 //<cluster-ip>:<port>
Hello Docker World!!
root@vm-vivekse-004:~# curl 10.244.3.7:9000 //<pod-ip>:<port>
Hello Docker World!!
Node02 (working):
root@vm-rosnthom-00f:~# curl 127.0.0.1:30847
Hello Docker World!!
root@vm-rosnthom-00f:~# curl 10.101.180.19:9000
Hello Docker World!!
root@vm-rosnthom-00f:~# curl 10.244.3.7:9000
Hello Docker World!!
Node03 (not working):
root@vm-plashkar-006:~# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
root@vm-plashkar-006:~# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
root@vm-plashkar-006:~# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
Node04 (not working):
root@vm-deepejai-00b:/# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
root@vm-deepejai-00b:/# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
root@vm-deepejai-00b:/# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
Tried netstat
and telnet
on all 4 slaves. Here's the output:
Node01 (the working host):
root@vm-vivekse-004:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 27808/kube-proxy
root@vm-vivekse-004:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Node02 (the working host):
root@vm-rosnthom-00f:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 11842/kube-proxy
root@vm-rosnthom-00f:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Node03 (the not-working host):
root@vm-plashkar-006:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 7791/kube-proxy
root@vm-plashkar-006:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
Node04 (the not-working host):
root@vm-deepejai-00b:/# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 689/kube-proxy
root@vm-deepejai-00b:/# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
Addition info:
From the kubectl get pods
output, I can see that the pod is actually deployed on slave vm-rosnthom-00f
. I am able to ping
this host from all the 5 VMs and curl vm-rosnthom-00f:30847
also works from all the VMs.
I can clearly see that the internal cluster networking is messed up, but I am unsure how to resolve it! iptables -L
for all the slaves are identical, and even the Local Loopback (ifconfig lo
) is up and running for all the slaves. I'm completely clueless as to how to fix it!
root@vm-deepejai-00b:/# curl THE_IP_OF_vm-vivekse-004:30847
to ensurevm-deepejai-00b
can conceivably route traffic tovm-vivekse-004
, because that's what is happening under the covers anyway – Guillotiptables -t nat -L
as well as justiptables -L
(I couldn't tell if that's what you meant) – Guillotroot@vm-deepejai-00b:~# curl 173.36.23.4:30847 <enter> Hello Docker World!!
where 173.36.23.4 is the IP of vm-vivekse-004 – Finialiptables -L
for all slaves are identical. Actually,iptables -L
is identical for Node02 (working node) and Node04 (non-working node) [diffchecker.com/JZzyspEL ] and identical for Node01 (working node) and Node03 (non-working node) [diffchecker.com/3X6WkdMR ] – Finialiptables -t nat -L
is almost identical for Node02 & Node04 [diffchecker.com/me6PhHCd ] and identical for Node01 and Node03 [diffchecker.com/CusUUMnN ] – Finialkubectl get pod -n kube-system -o wide
please? – Pansir