Kubernetes metrics-server FailedDiscoveryCheck
Asked Answered
D

2

5

was hoping to get a little help, my Google-Fu didnt get me much closer. I'm trying to install the metrics server for my fedora-coreos kubernetes 4 node cluster like so:

kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

the service seems to never start

kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"...
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2020-03-04T16:53:33Z
  Resource Version:    1611816
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 65d9a56a-c548-4d7e-a647-8ce7a865a266
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-03-04T16:53:33Z
    Message:               failing or missing response from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: bad status from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: 403
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

Diagnosing I have found googling around:

kubectl get deploy,svc -n kube-system |egrep metrics-server
deployment.apps/metrics-server   1/1     1            1           8m7s
service/metrics-server   ClusterIP   10.3.230.59   <none>        443/TCP         8m7s

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
Error from server (ServiceUnavailable): the server is currently unable to handle the request

kubectl get all --all-namespaces | grep -i metrics-server
kube-system      pod/metrics-server-75b5d446cd-zj4jm                              1/1     Running   0          9m11s
kube-system   service/metrics-server   ClusterIP      10.3.230.59    <none>        443/TCP                                     9m11s
kube-system      deployment.apps/metrics-server   1/1     1            1           9m11s
kube-system      replicaset.apps/metrics-server-75b5d446cd   1         1         1       9m11s

kubectl logs -f metrics-server-75b5d446cd-zj4jm -n kube-system
I0304 16:53:36.475657       1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
W0304 16:53:38.229267       1 authentication.go:296] Cluster doesn't provide requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
I0304 16:53:38.267760       1 secure_serving.go:116] Serving securely on [::]:4443

kubectl get -n kube-system deployment metrics-server -o yaml | grep -i args -A 10
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"metrics-server","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"metrics-server"}},"template":{"metadata":{"labels":{"k8s-app":"metrics-server"},"name":"metrics-server"},"spec":{"containers":[{"args":["--cert-dir=/tmp","--secure-port=4443","--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP"],"image":"k8s.gcr.io/metrics-server-amd64:v0.3.6","imagePullPolicy":"IfNotPresent","name":"metrics-server","ports":[{"containerPort":4443,"name":"main-port","protocol":"TCP"}],"securityContext":{"readOnlyRootFilesystem":true,"runAsNonRoot":true,"runAsUser":1000},"volumeMounts":[{"mountPath":"/tmp","name":"tmp-dir"}]}],"nodeSelector":{"beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64"},"serviceAccountName":"metrics-server","volumes":[{"emptyDir":{},"name":"tmp-dir"}]}}}}
  creationTimestamp: "2020-03-04T16:53:33Z"
  generation: 1
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
  resourceVersion: "1611810"
  selfLink: /apis/apps/v1/namespaces/kube-system/deployments/metrics-server
  uid: 006e758e-bd33-47d7-8378-d3a8081ee8a8
spec:
--
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        imagePullPolicy: IfNotPresent
        name: metrics-server
        ports:
        - containerPort: 4443
          name: main-port

finally my deployment config:

 spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        command:
          - /metrics-server
          - --kubelet-insecure-tls
          - --kubelet-preferred-address-types=InternalIP
        args:
          - --cert-dir=/tmp
          - --secure-port=4443
          - --kubelet-insecure-tls
          - --kubelet-preferred-address-types=InternalIP
        ports:
        - name: main-port
          containerPort: 4443
          protocol: TCP
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/os: linux
        kubernetes.io/arch: "amd64"

I'm at a loss of what it could be getting the metrics service to start and just get the basic kubectl top node to display any info all I get is

Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)

I have searched the internet and tried adding the args: and command: lines but no luck

command:
           - /metrics-server
           - --kubelet-insecure-tls
           - --kubelet-preferred-address-types=InternalIP
args:
          - --cert-dir=/tmp
          - --secure-port=4443
          - --kubelet-insecure-tls
          - --kubelet-preferred-address-types=InternalIP

Can anyone shed light on how to fix this? Thanks

Pastebin log file Log File

Dropout answered 4/3, 2020 at 17:12 Comment(6)
I'm guessing that you are using kubeadm. Could you provide version of kubeadm, kubectl, docker etc? Also what CNI are you using? Ive already encounter similar issue here: #60101898 Could you add to hostNetwork: true to your deployment? Did you try to use weave net CNI?Primogenial
System Info: OS Image: Fedora CoreOS 31.20200210.3.0 Operating System: linux Architecture: amd64 Container Runtime Version: docker://18.9.8 Kubelet Version: v1.17.3 Kube-Proxy Version: v1.17.3Dropout
What about CNI? Did you try to run metrics-server deployment with hostNetwork: true?Primogenial
I also uncommented hostNetwork: true from my deployment file and no change.Dropout
Did you also uncomment - --kubelet-preferred-address-types=InternalIP and - --kubelet-insecure-tls? Those 2 flags and hostNetwork: true cannot be commentedPrimogenial
Hi, yes, I will update the original deployment config., But still the same resultDropout
P
5

I've reproduced your issue. I have used Calico as CNI.

$ kubectl get nodes
NAME              STATUS   ROLES    AGE     VERSION
fedora-master     Ready    master   6m27s   v1.17.3
fedora-worker-1   Ready    <none>   4m48s   v1.17.3
fedora-worker-2   Ready    <none>   4m46s   v1.17.3

fedora-master:~/metrics-server$ kubectl describe apiservice v1beta1.metrics.k8s.io
Status:
  Conditions:
    Last Transition Time:  2020-03-12T16:04:59Z
    Message:               failing or missing response from https://10.99.122.196:443/apis/metrics.k8s.io/v
1beta1: Get https://10.99.122.196:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting
 for connection (Client.Timeout exceeded while awaiting headers)

fedora-master:~/metrics-server$ kubectl top pod
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)

When you have only one node in cluster, default settings in metrics-server repo works correctly. Issue occurs when you have more than 2 nodes. Ive used 1 master and 2 workers to reproduce. Below example deployment which works correct (have all required args). Before, please remove your current metrics-server YAMLs (kubectl delete -f deploy/kubernetes) and execute:

$ git clone https://github.com/kubernetes-sigs/metrics-server
$ cd metrics-server/deploy/kubernetes/
$ vi metrics-server-deployment.yaml

Paste below YAML:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      hostNetwork: true
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        imagePullPolicy: IfNotPresent
        args:
          - /metrics-server
          - --kubelet-preferred-address-types=InternalIP
          - --kubelet-insecure-tls
          - --cert-dir=/tmp
          - --secure-port=4443
        ports:
        - name: main-port
          containerPort: 4443
          protocol: TCP
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp
      nodeSelector:
        kubernetes.io/os: linux
        kubernetes.io/arch: "amd64"

save and quit using :wq

$ cd ~/metrics-server
$ kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

Wait a while for metrics-server to gather a few metrics from nodes.

$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:    
...
Metadata:
  Creation Timestamp:  2020-03-12T16:57:58Z
...
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-03-12T16:58:01Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

after a few minutes you can use top.

$ kubectl top nodes
NAME              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
fedora-master     188m         9%     1315Mi          17%       
fedora-worker-1   109m         5%     982Mi           13%       
fedora-worker-2   84m          4%     969Mi           13%   

If you will still encounter some issues, please add - --v=6 to deployment and provide logs from metrics-server pod.

containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        args:
          - /metrics-server
          - --v=6
          - --kubelet-preferred-address-types=InternalIP
          - --kubelet-insecure-tls
Primogenial answered 12/3, 2020 at 17:5 Comment(9)
Wow really appreciate the detailed explanation, but I'm still getting the error. Adding the pastebin link to the original post. ThanksDropout
Did you removed completely earlier metrics-server resources? What CNI are you using? Did you try Weave as CNI?Primogenial
Yes, I had completely removed the earlier metric-server resource. I'm using Calico as the CNI. I'm thinking of blasting this k8s cluster and go with a k3s cluster since I'm also having issues getting rook ceph to work also.Dropout
I think I might have found the issue, did you have the aggregator enabled when you built your cluster? Since I use typhoon to build the cluster, I noticed that is disabled by default.Dropout
For any reading this and used posiden.io to build there Kubernetes cluster, you have to enable the aggregation layer if you look in variables.tf you'll see the arg. Thanks, @Primogenial really appreciate the detailed explanation and help!Dropout
Ive reproduced this on GCE, probably it was enabled as default. There is also flag which can be used in YAML - --requestheader-allowed-names=aggregator. Glad to hear you were able to solve it.Primogenial
Thanks, it worked for me after adding "hostNetwork: true" for the Flannel network.Hadrian
I diffed your kind: Deployment against what I had deployed to see what you changed. Got this. Applied it, and it's working now. Thanks!Unpile
@Primogenial I added - --requestheader-allowed-names=aggregator in yaml file, still no luck. I see failing or missing response from x.x.x.x:443/apis/metrics.k8s.io/v1beta1: Get x.x.x.x:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)Brita
M
0

You need to carefully check logs for calico-node pods. In my case i have some other network interfaces and the autodetection mechanism in calico was detecting wrong interface (ip address). You need to consult this documentation https://projectcalico.docs.tigera.io/reference/node/configuration.

What i did in my case, was simply:

kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=cidr=172.16.8.0/24

cidr is my "working network". After this, all calico-nodes restarted and suddenly everything was fine.

Madelinemadella answered 4/3, 2022 at 16:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.