How to debug kubectl apply for kube-flannel.yml?
Asked Answered
F

1

9

I'm trying to create a kubernetes cluster following the document at: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

First I have installed kubeadm with docker image on Coreos (1520.9.0) inside VirtualBox with Vagrant:

docker run -it \
    -v /etc:/rootfs/etc \
    -v /opt:/rootfs/opt \
    -v /usr/bin:/rootfs/usr/bin \
    -e K8S_VERSION=v1.8.4 \
    -e CNI_RELEASE=v0.6.0 \
    xakra/kubeadm-installer:0.4.7 coreos

This was my kubeadm init:

kubeadm init --pod-network-cidr=10.244.0.0/16

When run the command:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml

It returns:

clusterrole "flannel" configured
clusterrolebinding "flannel" configured
serviceaccount "flannel" configured
configmap "kube-flannel-cfg" configured
daemonset "kube-flannel-ds" configured

But if I check "kubectl get pods --all-namespaces"

It returns:

NAMESPACE     NAME                              READY     STATUS             RESTARTS   AGE
kube-system   etcd-coreos1                      1/1       Running            0          18m
kube-system   kube-apiserver-coreos1            1/1       Running            0          18m
kube-system   kube-controller-manager-coreos1   0/1       CrashLoopBackOff   8          19m
kube-system   kube-scheduler-coreos1            1/1       Running            0          18m

With journalctl -f -u kubelet I can see this error: Unable to update cni config: No networks found in /etc/cni/net.d

I suspect that something was wrong with the command kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml

Is there a way to know why this command doesn't work? Can I get some logs from anywhere?

Farrington answered 5/12, 2017 at 19:24 Comment(9)
hi, can you list out the files in the given directory /etc/cni/net.d ?Persist
hi, there isn't anything in this directoryFarrington
As I understand , You need to put kube-flannerl.yml manifest file in that directory. Kubelet will deploy itPersist
I think here should be the network configuration provided by flannel ymlFarrington
yea , you can try to put that flannel yaml file in that directory, hopefully it will workPersist
Nope, the flannel manifest should not be put inside that directory. What you did is correct (kubectl apply). That directory should contain a CNI configuration file which determines the subnet for the current node. That file should be created automatically. My guess is that you forgot to specify the subnet when you run kubeadm init. Could you share the exact command you used?Lodgings
thanks @whites11, I did kubeadm init --pod-network-cidr=10.244.0.0/16Farrington
Ensure that your kubelet is using cni as the network plugin. Try with ps -afe | grep kubeletLodgings
I think so kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10 --cluster-domain=cluster.local --fail-swap-on=falseFarrington
A
5

Just tonight I used kubespray to provision a vagrant cluster, on CoreOS, using flannel (vxlan), and I was also mystified about how flannel could be a Pod inside Kubernetes

It turns out, as seen here, that they are using flannel-cni image from quay.io to write out CNI files using a flannel side-car plus hostDir volume-mounts; it outputs cni-conf.json (that configures CNI to use flannel), and then net-conf.json (that configures the subnet and backend used by flannel).

I hope the jinja2 mustache syntax doesn't obfuscate the answer, but I found it very interesting to see how the Kubernetes folks chose to do it "for real" to compare and contrast against the example DaemonSet given in the flannel-cni README. I guess that's the long way of saying: try the descriptors in the flannel-cni README, then if it doesn't work see if they differ in some way from the known-working kubespray setup

update: as a concrete example, observe that the Documentation yaml doesn't include the --iface= switch, and if your Vagrant setup is using both NAT and "private_network" then it likely means flannel is binding to eth0 (the NAT one) and not eth1 with a more static IP. I saw that caveat mentioned in the docs, but can't immediately recall where in order to cite it

update 2

Is there a way to know why this command doesn't work? Can I get some logs from anywhere?

One may almost always access the logs of a Pod (even a statically defined one such as kube-controller-manager-coreos1) in the same manner: kubectl --namespace=kube-system logs kube-controller-manager-coreos1, and in the CrashLoopBackOff circumstance, adding in the -p for "-p"revious will show the logs from the most recent crash (but only for a few seconds, not indefinitely), and occasionally kubectl --namespace=kube-system describe pod kube-controller-manager-coreos1 will show helpful information in either the Events section at the bottom, or in the "Status" block near the top if it was Terminated for cause

In the case of a very bad failure, such as the apiserver failing to come up (and thus kubectl logs won't do anything), then ssh-ing to the Node and using a mixture of journalctl -u kubelet.service --no-pager --lines=150 and docker logs ${the_sha_or_name} to try and see any error text. You will almost certainly need docker ps -a in the latter case to find the exited container's sha or name, but that same "only for a few seconds" applies, too, as dead containers will be pruned after some time.

In the case of vagrant, one can ssh into the VM in one of several ways:

  • vagrant ssh coreos1
  • vagrant ssh-config > ssh-config && ssh -F ssh-config coreos1
  • or if it has a "private_network" address, such as 192.168.99.101 or such, then you can usually ssh -i ~/.vagrant.d/insecure_private_key [email protected] but one of the first two are almost always more convenient
Awry answered 9/12, 2017 at 9:19 Comment(1)
Thanks @Matthew, kubectl --namespace=kube-system describe pod kube-controller-manager-coreos1 gives me the error: mkdir /usr/libexec/kubernetes: read-only file system. Which I think is related to kubeadmin bug github.com/kubernetes/kubeadm/issues/410 and fixed in kubernetes github.com/kubernetes/kubernetes/pull/51296Farrington

© 2022 - 2024 — McMap. All rights reserved.