CertManager Letsencrypt CertificateRequest "failed to perform self check GET request"
Asked Answered
I

6

17

Waiting for http-01 challenge propagation: failed to perform self check GET request, it's similar to this bug https://github.com/jetstack/cert-manager/issues/656 but all solutions from GitHub ticket comments didn't help.

I'm trying to set up CertManager on DigitalOcean as described in this tutorial: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-with-cert-manager-on-digitalocean-kubernetes I didn't get any error but the request from CertManager is waiting in pending state more than 40 hours.

I have successfully configured Ingress with Nginx, then I created a namespace and created CertManager CRDs:

$ kubectl create namespace cert-manager
$ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.12.0/cert-manager.yaml

I can see all CertManager pods as expected:

$ kubectl get pods --namespace cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5c47f46f57-gxhwv              1/1     Running   0          42h
cert-manager-cainjector-6659d6844d-xp75s   1/1     Running   0          42h
cert-manager-webhook-547567b88f-k4dv2      1/1     Running   0          42h

Then I created staging issuer:

---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
  namespace: cert-manager
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
      - http01:
          ingress:
            class: nginx

And updated Ingress configuration:

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: echo-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    # cert-manager.io/cluster-issuer: "letsencrypt-prod"
    cert-manager.io/cluster-issuer: "letsencrypt-staging"
spec:
  tls:
    - hosts:
        - echo.some.domain
      secretName: ingress-tls
  rules:
    - host: echo.some.domain
      http:
        paths:
          - backend:
              serviceName: echo1
              servicePort: 80

But after that, CertManager didn't update certificate and waiting in InProgress state:

$ date
Wed 18 Dec 2019 01:58:08 PM MSK

$ kubectl describe cert

...
Status:
  Conditions:
    Last Transition Time:  2019-12-16T17:23:56Z
    Message:               Waiting for CertificateRequest "ingress-tls-1089568541" to complete
    Reason:                InProgress
    Status:                False
    Type:                  Ready
Events:                    <none>

Instead of using Fake LE Intermediate X1 as a CN it returns CN=Kubernetes Ingress Controller Fake Certificate,O=Acme Co

$ kubectl describe CertificateRequest 
Status:
  Conditions:
    Last Transition Time:  2019-12-16T17:50:05Z
    Message:               Waiting on certificate issuance from order default/ingress-tls-1089568541-1576201144: "pending"
    Reason:                Pending
    Status:                False
    Type:                  Ready
Events:                    <none>

What can be the problem with CertManager and how to fix it?


Update:

Ingress logs contains these errors:

$ kubectl -n ingress-nginx logs  nginx-ingress-controller-7754db565c-g557h 

I1218 17:24:30.331127       6 status.go:295] updating Ingress default/cm-acme-http-solver-4dkdn status from [] to [{xxx.xxx.xxx.xxx }]
I1218 17:24:30.333250       6 status.go:295] updating Ingress default/cm-acme-http-solver-9dpqc status from [] to [{xxx.xxx.xxx.xxx }]
I1218 17:24:30.341292       6 event.go:209] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"cm-acme-http-solver-4dkdn", UID:"2e523b74-8bbb-41c7-be8a-44d8db8abd6e", APIVersion:"extensions/v1beta1", ResourceVersion:"722472", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/cm-acme-http-solver-4dkdn
I1218 17:24:30.344340       6 event.go:209] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"cm-acme-http-solver-9dpqc", UID:"b574a3b6-6c5b-4266-a4e2-6ff2de2d78e0", APIVersion:"extensions/v1beta1", ResourceVersion:"722473", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress default/cm-acme-http-solver-9dpqc
W1218 17:24:30.442276       6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate
W1218 17:24:30.442950       6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate
W1218 17:24:33.775476       6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate
W1218 17:24:33.775956       6 controller.go:1042] Error getting SSL certificate "default/ingress-tls": local SSL certificate default/ingress-tls was not found. Using default certificate

Update2:

Secret for ingress-tls is available as expected:

$ kubectl get secret ingress-tls -o yaml

apiVersion: v1
data:
  ca.crt: ""
  tls.crt: ""
  tls.key: <secret-key-data-base64-encoded>
kind: Secret
metadata:
  annotations:
    cert-manager.io/certificate-name: ingress-tls
    cert-manager.io/issuer-kind: ClusterIssuer
    cert-manager.io/issuer-name: letsencrypt-staging
  creationTimestamp: "2019-12-16T17:23:56Z"
  name: ingress-tls
  namespace: default
  resourceVersion: "328801"
  selfLink: /api/v1/namespaces/default/secrets/ingress-tls
  uid: 5d640b66-1572-44a1-94e4-6d85a73bf21c
type: kubernetes.io/tls

Update3:

I've found that cert-manager pod is failing with log:

E1219 11:06:08.294011       1 sync.go:184] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://<some.domain>/.well-known/acme-challenge/<some-path>': Get http://<some.domain>/.well-known/acme-challenge/<some-path>: dial tcp xxx.xxx.xxx.xxx:80: connect: connection timed out" "dnsName"="<some.domain>" "resource_kind"="Challenge" "resource_name"="ingress-tls-1089568541-1576201144-1086699008" "resource_namespace"="default" "type"="http-01" 

Challenge status:

$ kubectl describe challenge ingress-tls-1089568541-1576201144-471532423

Name:         ingress-tls-1089568541-1576201144-471532423
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1alpha2
Kind:         Challenge
Metadata:
  Creation Timestamp:  2019-12-19T11:32:19Z
  Finalizers:
    finalizer.acme.cert-manager.io
  Generation:  1
  Owner References:
    API Version:           acme.cert-manager.io/v1alpha2
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  ingress-tls-1089568541-1576201144
    UID:                   7d19d86f-0b56-4756-aa20-bb85caf80b9e
  Resource Version:        872062
  Self Link:               /apis/acme.cert-manager.io/v1alpha2/namespaces/default/challenges/ingress-tls-1089568541-1576201144-471532423
  UID:                     503a8b4e-dc60-4080-91d9-2847815af1cc
Spec:
  Authz URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/123456
  Dns Name:   <domain>
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   ClusterIssuer
    Name:   letsencrypt-staging
  Key:      <key>
  Solver:
    http01:
      Ingress:
        Class:  nginx
  Token:        <token>
  Type:         http-01
  URL:          https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/12345/abc
  Wildcard:     false
Status:
  Presented:   true
  Processing:  true
  Reason:      Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://<domain>/.well-known/acme-challenge/<token>': Get http://<domain>/.well-known/acme-challenge/<token>: dial tcp xxx.xxx.xxx.xxx:80: connect: connection timed out
  State:       pending
Events:
  Type    Reason     Age    From          Message
  ----    ------     ----   ----          -------
  Normal  Started    4m28s  cert-manager  Challenge scheduled for processing
  Normal  Presented  4m28s  cert-manager  Presented challenge using http-01 challenge mechanism

I tried to delete the challenge to re-trigger it, but it failed with same error after one-two minutes. I checked that I can access challenge URL from cluster nodes (using kubectl run -it ... and wget http://<domain>/.well-known/acme-challenge/<token> from inside new pod.

Indusium answered 18/12, 2019 at 11:4 Comment(5)
In general, "Kubernetes ingress Controller Fake certificate" indicates problems on the certificates itself or in your setup.Gpo
@mWatney thanks. I've found these logs in ingress-controller (see the update). Where I can find more verbose logs to find the main reason of this error?Indusium
Double check if this secret (ingress-tls under default namespace) exists and if it's correct (post $ kubectl get secret ingress-tls -o yaml here). Check your certificate, sometimes it can't load it because you have some error as you can check here is this other case.Gpo
@mWatney looks like it's the same issue as in this bug. I don't know why it failed to perform self checks. I fixed it after uninstalled cert-manager fully and installed it again.Indusium
According to me your Q&A is more on-topic on DevOps SE. SO and SF are about software and system engineering respectively.Leighannleighland
S
8

This might be worthwhile to look at. I was facing similar issue with Connection Timeout

Change LoadBalancer in ingress-nginx service.

Add/Change externalTrafficPolicy: Cluster.

Reason being, pod with the certificate-issuer wound up on a different node than the load balancer did, so it couldn’t talk to itself through the ingress.

Below is complete block taken from https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.26.1/deploy/static/provider/cloud-generic.yaml

kind: Service
apiVersion: v1
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  #CHANGE/ADD THIS
  externalTrafficPolicy: Cluster
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  ports:
    - name: http
      port: 80
      targetPort: http
    - name: https
      port: 443
      targetPort: https

---
Shogunate answered 21/5, 2020 at 4:42 Comment(0)
S
3

In my case the cert-manager wanted to request the challenge via an internal ip-address.

failed to perform self check GET request 'http:///.well-known/acme-challenge/': Get http:///.well-known/acme-challenge/: dial tcp 10.67.0.8:80: connect: connection timed out

i.e. the DNS-resolution was broken. I fixed this by changing the deployment of cert-manager to accept only external DNS-servers like so

spec:
  template:
    spec:
      dnsConfig:
        nameservers:
        - 8.8.8.8
      dnsPolicy: None

This is how you do it. Also created an Issue so we can change that with the helm installation

Succinylsulfathiazole answered 30/5, 2020 at 22:21 Comment(0)
V
3

I had the exact same issue it seems to be related with a bug in how Digital Ocean load balancer work. This thread lets-encrypt-certificate-issuance suggested adding the annotation service.beta.kubernetes.io/do-loadbalancer-hostname: "kube.mydomain.com" to the load balancer. In my case i did not had a yaml config file for the load balancer, I just copied the load balancer declaration from the nginx-ingress install script and applied the new configuration to the kubernetes cluster. Below is the final config for the load balancer.

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol: 'true'
    # See https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
    service.beta.kubernetes.io/do-loadbalancer-hostname: "kube.mydomain.com"
  labels:
    helm.sh/chart: ingress-nginx-3.19.0
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/version: 0.43.0
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: controller
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/component: controller
Volution answered 25/1, 2021 at 6:14 Comment(2)
This solved my problem, I just added the service.beta.kubernetes.io/do-loadbalancer-hostname annotation. I edited the resource with kubectl edit service ingress-nginx-controller -n ingress-nginx. Have in mind that you might have a different name and namespace.Armenian
+1 This also helped me. Fixed the internal endpoints of my bound services from an arbitrary IP address to the hostname specified. After apply the configuration change the Issuer was able to complete the challenge and the SSL was provisioned correctly.Reddy
I
0

I didn't found the reason of this issue, so I'll post how I've solved it as answer. Looks like it's the same issue as in this bug. I fixed it by uninstalling the cert-manager fully and installing it again without changing any configurations ot settings.

Indusium answered 14/5, 2020 at 8:54 Comment(0)
T
0

One of my CertManager pods was frozen so I deleted them all and they restarted. The certs renewed immediately.

kubectl get pods -n cert-manager (or whatever namespace your pods are in)

Then delete them all.

kubectl delete pod -n cert-manager cert-manager-xxxx cert-manager-cainjector-xxxx cert-manager-webhook-xxxx

Tinney answered 4/5, 2021 at 21:50 Comment(0)
P
0

for Digital Ocean k8s deployments.
generic nginx-ingress wont work. Install their nginx-ingress helm chart from marketplace. add service.beta.kubernetes.io/do-loadbalancer-hostname: "kube.mydomain.com" to annotations and you are good to go.
It is unfortunate that these cloud providers all have their own customization required such basic apps and what works on one does not work on the other. The best, cloud agnostic way seems to be have our VMs/instances and deploy k8s there with k8s provisioners like rancher or RKE so that we dont keep customizaing our deployment across clouds.

Purl answered 7/5, 2023 at 22:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.