kubernetes service IPs not reachable
Asked Answered
S

3

8

So I've got a Kubernetes cluster up and running using the Kubernetes on CoreOS Manual Installation Guide.

$ kubectl get no
NAME              STATUS                     AGE
coreos-master-1   Ready,SchedulingDisabled   1h
coreos-worker-1   Ready                      54m

$ kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok
scheduler            Healthy   ok
etcd-0               Healthy   {"health": "true"}
etcd-2               Healthy   {"health": "true"}
etcd-1               Healthy   {"health": "true"}

$ kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                      READY     STATUS    RESTARTS   AGE       IP               NODE
default       curl-2421989462-h0dr7                     1/1       Running   1          53m       10.2.26.4        coreos-worker-1
kube-system   busybox                                   1/1       Running   0          55m       10.2.26.3        coreos-worker-1
kube-system   kube-apiserver-coreos-master-1            1/1       Running   0          1h        192.168.0.200   coreos-master-1
kube-system   kube-controller-manager-coreos-master-1   1/1       Running   0          1h        192.168.0.200   coreos-master-1
kube-system   kube-proxy-coreos-master-1                1/1       Running   0          1h        192.168.0.200   coreos-master-1
kube-system   kube-proxy-coreos-worker-1                1/1       Running   0          58m       192.168.0.204   coreos-worker-1
kube-system   kube-scheduler-coreos-master-1            1/1       Running   0          1h        192.168.0.200   coreos-master-1

$ kubectl get svc --all-namespaces
NAMESPACE   NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
default     kubernetes   10.3.0.1     <none>        443/TCP   1h

As with the guide, I've setup a service network 10.3.0.0/16 and a pod network 10.2.0.0/16. Pod network seems fine as busybox and curl containers get IPs. But the services network has problems. Originally, I've encountered this when deploying kube-dns: the service IP 10.3.0.1 couldn't be reached, so kube-dns couldn't start all containers and DNS was ultimately not working.

From within the curl pod, I can reproduce the issue:

[ root@curl-2421989462-h0dr7:/ ]$ curl https://10.3.0.1
curl: (7) Failed to connect to 10.3.0.1 port 443: No route to host

[ root@curl-2421989462-h0dr7:/ ]$ ip route
default via 10.2.26.1 dev eth0
10.2.0.0/16 via 10.2.26.1 dev eth0
10.2.26.0/24 dev eth0  src 10.2.26.4

It seems ok that there's only a default route in the container. As I understood it, the request (to default route) should be intercepted by the kube-proxy on the worker node, forwarded to the the proxy on the master node where the IP is translated via iptables to the masters public IP.

There seems to be a common problem with a bridge/netfilter sysctl setting, but that seems fine in my setup:

core@coreos-worker-1 ~ $ sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1

I'm having a real hard time to troubleshoot, as I lack the understanding of what the service IP is used for, how the service network is supposed to work in terms of traffic flow and how to best debug this.

So here're the questions I have:

  • What is the 1st IP of the service network (10.3.0.1 in this case) used for?
  • Is above description of the traffic flow correct? If not, what steps does it take for a container to reach a service IP?
  • What are the best ways to debug each step in the traffic flow? (I can't get any idea what's wrong from the logs)

Thanks!

Standley answered 9/3, 2017 at 21:8 Comment(0)
S
12

The Sevice network provides fixed IPs for Services. It is not a routeable network (so don't expect ip ro to show anything nor will ping work) but a collection iptables rules managed by kube-proxy on each node (see iptables -L; iptables -t nat -L on the nodes, not Pods). These virtual IPs (see the pics!) act as load balancing proxy for endpoints (kubectl get ep), which are usually ports of Pods (but not always) with a specific set of labels as defined in the Service.

The first IP on the Service network is for reaching the kube-apiserver itself. It's listening on port 443 (kubectl describe svc kubernetes).

Troubleshooting is different on each network/cluster setup. I would generally check:

  • Is kube-proxy running on each node? On some setups it's run via systemd and on others there is a DeamonSet that schedules a Pod on each node. On your setup it is deployed as static Pods created by the kubelets thrmselves from /etc/kubernetes/manifests/kube-proxy.yaml
  • Locate logs for kube-proxy and find clues (can you post some?)
  • Change kube-proxy into userspace mode. Again, the details depend on your setup. For you it's in the file I mentioned above. Append --proxy-mode=userspace as a parameter on each node
  • Is the overlay (pod) network functional?

If you leave comments I will get back to you..

Selwyn answered 13/3, 2017 at 9:41 Comment(3)
Thanks for the description, it helped me to debug and resolve the issue! As mentioned, the logs showed nothing special (basically it reported the iptables rules that got added). So I checked with iptabels -j LOG statements if DNAT was working and if replies were also arriving which they did, so I concluded a local forwading-to-conatiner issue. Looking at the nodes routing table I saw that docker0 had the same subnet as cni0. Checking with the guide, I missed the docker_opts_cni.env part. After correcting, docker0 had another subnet and everything started to work out. Thanks!Standley
Well done on debugging it! My answer was a bit difficult to understand.. I composed it on my phone. Sorry :)Selwyn
Thank you for nudge, turns out that if you set service and pod network with the same IP range, requests won't reach iptables nat, because arp fails first :)Iolaiolande
P
3

I had this same problem, and the ultimate solution that worked for me was enabling IP forwarding on all nodes in the cluster, which I had neglected to do.

$ sudo sysctl net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1

Service IPs and DNS started working immediately afterwards.

Puduns answered 2/2, 2018 at 1:16 Comment(1)
Thanks. Just want to add that this command won't make it permanent. In order to make it permanent edit /etc/sysctl and uncomment #net.ipv4.ip_forward=1 lineLissalissak
D
2

I had the same issue, turned out to be a configuration issue in kube-proxy.yaml For the "master" parameter I had the ip address as in - --master=192.168.3.240 but it actually required to be a url like - --master=https://192.168.3.240

FYI my kube-proxy sucessfully uses --proxy-mode=iptables (v1.6.x)

Decry answered 17/5, 2017 at 12:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.