Does Alpine have known DNS issue within Kubernetes?
Asked Answered
L

4

36

Lately, we've faced some DNS issues with micro-services based on Alpine image (node:12.18.1-alpine) on EKS when trying to resolve "big" DNS queries (When the answer is larger than 512M).

So I've tried running this script for testing the DNS resolution:

var dns = require('dns');
var w3 = dns.lookup('hugedns.test.dziemba.net', function (err, addresses, family) {
  console.log(addresses);
});

with 2 different scenarios for each image

  1. node:12.18.1-alpine
  • Running the image on my laptop - Resolved successfully
  • Running the image on EKS 1.16 - Failed to resolve
  1. node:12.18.1-slim
  • Running the image on my laptop - Resolved successfully
  • Running the image on EKS 1.16 - Resolved successfully

From what I saw, Alpine is using musl (which doesn't support DNS to use TCP?) libraries instead of glibc, since the DNS protocol is using UDP and tries falling back to TCP only when the query is larger than 512M. So my theory was that this is the root cause, but since it is working on my end and failing on EKS made me wonder where can the issue relay...

Any thoughts?

EKS v1.16 coredns:v1.6.6

BTW, this is my first post, let me know if any information is needed

Linwoodlinz answered 7/12, 2020 at 11:35 Comment(1)
What's an example of: "big" DNS queries (When the answer is larger than 512M) ??Mannerheim
W
61

Yes, the Alpine images are known to be problematic in Kubernetes cluster concerning DNS queries.

Even if it is not clear if the bug has been effectively fixed in any current version of Alpine, here are some related links:

I encountered this problem on my side in my Kubernetes clusters as of January 2021 with up-to-date Alpine 3.12 images, so I would assume it is not fixed.

The core problem seems to be that the musl library stop searching among possible domains specified in the search directive of /etc/resolv.conf for a given name if any response is unexpected (basically not something clearly indicating that the FQDN could not be found, or has been found).

This does not play well with the Kubernetes strategy about name resolution in pods.

Indeed, one can see that the typical /etc/resolv.conf of a pod in the example namespace is the following:

nameserver 10.3.0.10
search example.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

The strategy is that the resolution of a name, for instance my-service or www.google.com, will be tested against each of the domains specified in the search directive: here for the examples, it would be the FQDN chains my-service.example.svc.cluster.local,my-service.svc.cluster.local,my-service.cluster.local,my-service and www.google.com.example.svc.cluster.local,www.google.com.svc.cluster.local,www.google.com.cluster.local,www.google.com. Here obviously it would be the 1st FQDN of the first chain (my-service.example.svc.cluster.local) and the last FQDN of the second chain (www.google.com) that would be resolved correctly.

One can see that this strategy is made to optimize resolution of internal name of the cluster, in a way that allows names like my-service, my-service.my-namespace or my-service.my-namespace.svc to be nicely resolved out-of-the-box.

The ndots parameter in the options directive defines the minimum number of dots in a name to consider that a name is actually a FQDN and so the search chain should be skiped in favor of a direct DNS resolution attempt. With ndots:2, www.google.com will be considered as a FQDN while my-service.my-namespace will go through the search chain.

Given that the search option over 3 possible domains, that any obvious URL will not be considered as a FQDN because of ndots:5 and the break of the search loop in musl library in Alpine docker, all of this dramatically increases the probability of a host resolution failure in a Docker Alpine running in Kubernetes. If your host resolution is part of some kind of a loop running regularly, you will encounter a lot of failures that need to be handled.

What to do about this ?

  • you can use a dnsPolicy to reduce ndots and consider shorter names as FQDN and skip the search loop (see https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html)
  • you can generate a entrypoint script for your image that would modify the /etc/resolv.conf accordingly to your needs, for instance cat /etc/resolv.conf | sed -r "s/^(search.*|options.*)/#\1/" > /tmp/resolv && cat /tmp/resolv > /etc/resolv.conf will remove all the stuff about search and options, if you do not rely on any internal name of the cluster in your process
  • direct edition of /etc/host to hardcode some FQDN to their known IPs
  • move away from Alpine images, and go for Ubi or Debian/Ubuntu based images

Personally I started with Alpine, like a lot of us in the early days of Docker industrialization because the other full OS images were insanely big. This is mostly not the case anymore with strongly tested slim images for Ubuntu or Debian, or even Kubernetes-centric initiatives like Ubi. That is why I usually choose the last alternative (move away from Alpine images).

Whitethroat answered 6/1, 2021 at 9:50 Comment(2)
Actually, all you need is to install bind-tools on alpine! Even on ubuntu again you need to install dnsutils for it to work!Atul
The problems described here is not that someone is completely unable to lookup a DNS entry but that one is able to RELIABLY look up a DNS entry every time. Simply installing bind-tools will not fix the issue of consistently and reliably looking up a DNS entry.Rickirickie
A
2

For all of your applications/pods, you need to install bind-tools on alpine:

apk update && apk add bind-tools

or you can add it to your Dockerfile:

RUN apk update && apk add bind-tools

enter image description here

For more information, check this out: https://github.com/nodejs/docker-node/issues/339

Atul answered 4/1, 2022 at 11:27 Comment(1)
yeah, it actually works.Adamo
D
1

Well.. seems I've fixed it. Not sure if I fully understand details why it was failed. Default corefile of coredns looks like this

.:53 {
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
      pods insecure
      fallthrough in-addr.arpa ip6.arpa
    }
    hosts /etc/coredns/NodeHosts {
      ttl 60
      reload 15s
      fallthrough
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}
import /etc/coredns/custom/*.server

replacing

 forward . /etc/resolv.conf

with

 forward . 8.8.8.8

Make alpine works with any ndots inside. 8.8.8.8 can be replaced with your own domain name for public names resolution. 8.8.8.8 Here is used to revolv non-local domain addresses.

Dawkins answered 26/2, 2023 at 9:11 Comment(0)
A
0

Bind tools only fixes the shell tools for DNS lookup, we had to get off Alpine base image to Debian or Ubuntu for NodeJS to work in EKS.

Alcoholize answered 30/3, 2022 at 1:48 Comment(12)
Why - what does Node rely on for DNS requests?Uboat
API HTTP REST calls in our application and RDS DB calls.. everything is coming back with this error..Alcoholize
we have a working solution using official nodejs alpine images. Here you can check it out: hub.docker.com/_/nodeAtul
I don't see the solution your talking about.. Can you be more pinpoint specific... As far as I can find this seems to only effect EKS and were looking to move away from Alpine Linux base images in kubernetes because of it...Alcoholize
@Uboat probably mode relays on getaddrinfo system call. This uses OS libc implementation. Musl don't support TCP DNS requests. Everything you use a name instead of a IP needs to be resolved into IP using DNS.Penna
@TimothyHutz maybe it's returning a large answer to those names. For bigger DNS queries is necessary to use TCP DNS, instead of default UDP DNS. musl, alpines libc implementation, doesn't have TCP DNS. Gnu libc had it, any GNU + Linux should work, if this is the problem.Penna
@Penna What I was going for is more information on why this solution works. If it's clear to you, can you edit the answer and add those details inline?Uboat
A relevant post regarding the underlying protocol: serverfault.com/questions/404840/…Uboat
@Timothy Hutz we also use EKS, I updated my answer, check it out!Atul
@Timothy Hutz it's the link to the docker image, not the solution. As I mentioned in my answer, you can add this line to your docker file: RUN apk update && apk add bind-toolsAtul
Even on Ubuntu, you need to install DNS utils for it to work correctly: sudo apt-get update sudo apt-get install dnsutilsAtul
And also yeah, your problem is exactly what you just stated, shell tools for DNS lookup!Atul

© 2022 - 2024 — McMap. All rights reserved.