containerd error "failed to find user by uid" when creating ejbca docker container on azure
Asked Answered
A

2

8

When I try to create an Azure container instance for EJBCA-ce I get an error and cannot see any logs.

I expect the following result : azure portal container instance events success

But I get the following error :

azure portal container instance events failure

Failed to start container my-azure-container-resource-name, Error response: to create containerd task: failed to create container e9e48a_________ffba97: guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown

Some context:

I run the container on azure cloud container instance

I tried

  • from ARM template
  • from Azure Portal.
  • with file share mounted
  • with database env variable
  • without any env variables

It runs fine locally using the same env variable (database configuration). It used to run with the same configuration a couple weeks ago.

Here are some logs I get when I attach the container group from az cli.

(count: 1) (last timestamp: 2020-11-03 16:04:32+00:00) pulling image "primekey/ejbca-ce:6.15.2.3"
(count: 1) (last timestamp: 2020-11-03 16:04:37+00:00) Successfully pulled image "primekey/ejbca-ce:6.15.2.3"
(count: 28) (last timestamp: 2020-11-03 16:27:52+00:00) Error: Failed to start container aci-pulsy-ccm-ejbca-snd, Error response: to create containerd task: failed to create container e9e48a06807fba124dc29633dab10f6229fdc5583a95eb2b79467fe7cdffba97: guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown

An extract of the dockerfile from dockerhub

I suspect the issue might be related to the commands USER 0 and USER 10001 we found several times in the dockerfile.

COPY dir:89ead00b20d79e0110fefa4ac30a827722309baa7d7d74bf99910b35c665d200 in /
/bin/sh -c rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CMD ["/bin/bash"]
USER 0
COPY dir:893e424bc63d1872ee580dfed4125a0bef1fa452b8ae89aa267d83063ce36025 in /opt/primekey
COPY dir:756f0fe274b13cf418a2e3222e3f6c2e676b174f747ac059a95711db0097f283 in /licenses
USER 10001
CMD ["/opt/primekey/wildfly-14.0.1.Final/bin/standalone.sh" "-b" "0.0.0.0"
MAINTAINER PrimeKey Solutions AB
ARG releaseTag
ARG releaseEdition

ARM template

{
      "type": "Microsoft.ContainerInstance/containerGroups",
      "apiVersion": "2019-12-01",
      "name": "[variables('ejbcaContainerGroupName')]",
      "location": "[parameters('location')]",
      "tags": "[variables('tags')]",
      "dependsOn": [
        "[resourceId('Microsoft.DBforMariaDB/servers', variables('ejbcaMariadbServerName'))]",
        "[resourceId('Microsoft.DBforMariaDB/servers/databases', variables('ejbcaMariadbServerName'), variables('ejbcaMariadbDatabaseName'))]"
      ],
      "properties": {
        "sku": "Standard",
        "containers": [
          {
            "name": "[variables('ejbcaContainerName')]",
            "properties": {
              "image": "primekey/ejbca-ce:6.15.2.3",
              "ports": [
                {
                  "protocol": "TCP",
                  "port": 443
                },
                {
                  "protocol": "TCP",
                  "port": 8443
                }
              ],
              "environmentVariables": [

                {
                  "name": "DATABASE_USER",
                  "value": "[concat(parameters('mariadbUser'),'@', variables('ejbcaMariadbServerName'))]"
                },
                {
                  "name": "DATABASE_JDBC_URL",
                  "value": "[variables('ejbcaEnvVariableJdbcUrl')]"
                },
                {
                  "name": "DATABASE_PASSWORD",
                  "secureValue": "[parameters('mariadbAdminPassword')]"
                }
              ],
              "resources": {
                "requests": {
                  "memoryInGB": 1.5,
                  "cpu": 2
                }
              }
              ,
               "volumeMounts": [
                 {
                   "name": "certificates",
                   "mountPath": "/mnt/external/secrets"
                 }
               ]
            }
          }
        ],
        "initContainers": [],
        "restartPolicy": "OnFailure",
        "ipAddress": {
          "ports": [
                {
                  "protocol": "TCP",
                  "port": 443
                },
                {
                  "protocol": "TCP",
                  "port": 8443
                }
          ],
          "type": "Public",
          "dnsNameLabel": "[parameters('ejbcaContainerGroupDNSLabel')]"
        },
        "osType": "Linux",
         "volumes": [
           {
             "name": "certificates",
             "azureFile": {
               "shareName": "[parameters('ejbcaCertsFileShareName')]",
               "storageAccountName": "[parameters('ejbcaStorageAccountName')]",
               "storageAccountKey": "[parameters('ejbcaStorageAccountKey')]"
             }
           }
         ]
      }
    }

It runs fine on my local machine on linux (ubuntu 20.04)

docker run -it --rm -p 8080:8080 -p 8443:8443 -h localhost -e DATABASE_USER="mymaridbuser@my-db" -e DATABASE_JDBC_URL="jdbc:mariadb://my-azure-domain.mariadb.database.azure.com:3306/ejbca?useSSL=true" -e DATABASE_PASSWORD="my-pwd" primekey/ejbca-ce:6.15.2.3
Arte answered 3/11, 2020 at 17:35 Comment(1)
It looks like Azure doesn't use userns for Docker, or perhaps they are using another container runtime and not Docker itself. Probably you can use this image from Azure market place, because if this doesn't work you can get support from Bitnami: azuremarketplace.microsoft.com/en-us/marketplace/apps/…Hobnob
E
6

In the EJBCA-ce container image, I think they are trying to provide an user different than root to run the EJBCA server. According to the Docker documentation:

The USER instruction sets the user name (or UID) and optionally the user group (or GID) to use when running the image and for any RUN, CMD and ENTRYPOINT instructions that follow it in the Dockerfile

In the Dockerfile they reference two users, root, corresponding to UID 0, and another one, with UID 10001.

Typically, in Linux and UNIX systems, UIDs can be organized in different ranges: it is largely dependent on the concrete operating system and user management praxis, but it is very likely that the first user account created in a linux system will be assigned to UID 1001 or 10001, like in this case. Please, see for instance the UID entry in wikipedia or this article.

AFAIK, the USER indicated does not need to exist in your container to run it correctly: in fact, if you run it locally, it will start without further problem.

The user with UID 10001 will be actually setup in your container by the script that is run in the CMD defined in the Dockerfile, /opt/primekey/bin/start.sh, by this code fragment:

if ! whoami &> /dev/null; then
  if [ -w /etc/passwd ]; then
    echo "${APPLICATION_NAME}:x:$(id -u):0:${APPLICATION_NAME} user:/opt:/sbin/nologin" >> /etc/passwd
  fi
fi

Please, be aware that APPLICATION_NAME in this context takes the value ejbca and that the user which runs this script, as indicated in the Dockerfile, is 10001. That will be the value provided by the command id -u in this code.

You can verify it if you run your container locally:

docker run -it -p 8080:8080 -p 8443:8443 -h localhost primekey/ejbca-ce:6.15.2.3

And initiate bash into it:

 docker exec -it container_name /bin/bash

If you run whoami, it will tell you ejbca.

If you run id it will give you the following output:

uid=10001(ejbca) gid=0(root) groups=0(root)

You can verify the user existence in the /etc/passwd as well:

bash-4.2$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
ejbca:x:10001:0:ejbca user:/opt:/sbin/nologin

The reason why Pierre did not get this output is because he ran the container overwriting the provided CMD and, as a consequence, not executing the start.sh script responsible of the user creation, as above mentioned.

For any reason, and this is where my knowledge fails me, when Azure is trying to run your container, it is failing because the USER 10001 identified in the Dockerfile does not exist.

I think it could be related with the use of containerd instead of docker.

The error reported by Azure seems related with the Microsoft project opengcs.

They say about the project:

Open Guest Compute Service is a Linux open source project to further the development of a production quality implementation of Linux Hyper-V container on Windows (LCOW). It's designed to run inside a custom Linux OS for supporting Linux container payload.

And:

The focus of LCOW v2 as a replacement of LCOW v1 is through the coordination and work that has gone into containerd/containerd and its Runtime V2 interface. To see our containerd hostside shim please look here Microsoft/hcsshim/cmd/containerd-shim-runhcs-v1.

The error you see in the console is raised by the spec.go file that you can find in their code base, when they are trying to establish the user on behalf of whom the container process should be run:

func setUserID(spec *oci.Spec, uid int) error {
    u, err := getUser(spec, func(u user.User) bool {
        return u.Uid == uid
    })
    if err != nil {
        return errors.Wrapf(err, "failed to find user by uid: %d", uid)
    }
    spec.Process.User.UID, spec.Process.User.GID = uint32(u.Uid), uint32(u.Gid)
    return nil
}

This code is executed by this other code fragment - you can see the full function code here:

parts := strings.Split(userstr, ":")
switch len(parts) {
case 1:
    v, err := strconv.Atoi(parts[0])
    if err != nil {
        // evaluate username to uid/gid
        return setUsername(spec, userstr)
    }
    return setUserID(spec, int(v))

And the getUser function:

func getUser(spec *oci.Spec, filter func(user.User) bool) (user.User, error) {
    users, err := user.ParsePasswdFileFilter(filepath.Join(spec.Root.Path, "/etc/passwd"), filter)
    if err != nil {
        return user.User{}, err
    }
    if len(users) != 1 {
        return user.User{}, errors.Errorf("expected exactly 1 user matched '%d'", len(users))
    }
    return users[0], nil
}

As you can see, these are exactly the errors that Azure is reporting you.

As a summary, I think they are providing a Windows LCOW solution that conforms to the OCI Image Format Specification suitable to run containers with containerd.

As you indicated if It used to run with the same configuration a couple weeks ago my best guest is that, perhaps, they switched your containers from a pure Linux containerd runtime implementation to one based in Windows and in the above mentioned software, and this is why you containers are now failing.

A possible workaround could be to create a custom image based on the official provided by PrimeKey and create the user 10001, as also Pierre pointed out.

To accomplish this task, first, create a new custom Dockerfile. You can try, for instance:

FROM primekey/ejbca-ce:6.15.2.3

USER 0

RUN echo "ejbca:x:10001:0:ejbca user:/opt:/sbin/nologin" >> /etc/passwd

USER 10001

Please, note that you may need to define some of the environment variables from the official EJBCA image.

With this Dockerfile you can build your image with docker or docker compose with an appropriate docker-compose.yaml file, something like:

version: "3"

services:
  ejbca:
    image: <your repository>/ejbca
    build: .
    ports:
      - "8080:8080"
      - "8443:8443"

Please, customize it as you consider appropriate.

With this setup the new container will still run properly in a local environment in the same way as the original one: I hope it will be also the case in Azure.

Ensheathe answered 19/11, 2020 at 9:37 Comment(11)
@MarcBouvier I updated the answer with information related with the code I think Microsoft is using in the Azure containers implementation. As indicated in the answer, they probably switched your containers from a Linux containerd runtime implementation to one based in Windows and in the indicated software, and this is why you containers are now failing. I am afraid there is no a clear solution to your problem. My best advice is that you try some of the proposed workarounds or contact Microsoft support.Ensheathe
Here is the anwser from Azure support. It seems to match your hypothesis. "The image requires multiple users hub.docker.com/layers/primekey/ejbca-ce/6.15.2.3/images/… which is not supported on the container runtime." They propose to "pin the subscription to deploy only on container infrastructure" as a resolution.Arte
I'have digged into the details of your new inputs. Thank you for such a detailled answer. Let me accept your answer when the resolution of the support ticket confirms your answer.Arte
You are welcome @MarcBouvier. And thank you very much for the feedback. I'm sorry I wasn't able to help you more with my answer. Please let me know if you think I can be of any help.Ensheathe
@MarcBouvier Thank you very much, I really appreciate that you create a new bounty to award an existing answer, it was unnecessary. Please, if you ever need any help, let me know, I'll be happy to help if I can.Ensheathe
At the end I also created an image based on the official one adding the user 10001. It works.Arte
It's fine @MarcBouvier. I am very happy to hear that you were finally able to deploy ejbca. Great product by the way!Ensheathe
Yes, at first, I was not very happy to maintain my own docker image. But it is very simple : a few dockerfile lines and a very straightforward build pipeline. It is worth this small effort.Arte
ederra! This is a hell of an answerGirth
Thank you very much @aran!! I really appreciate your comment.Ensheathe
Sorry @MarcBouvier. I just realized your comment. That's great Marc. Yes, I think it's worth it. I am glad to hear that everything is working fine.Ensheathe
P
2

User with UID 10001 does not exists in your image. This does not prevent USER command in your Dockerfile to work or the image to be invalid itself, but it seems to cause issues with Azure container.

I cannot find doc or any reference on why it doesn't work on Azure (will update if so), but adding the user in the image should solve the issue. Try adding something like this in your Dockerfile to create user with UID 10001 (this must be done as root, i.e. with user 0) :

useradd -u 10001 myuser

Additional notes to see user 10001 does not exists:

# When running container, not recognized by system
$ docker run docker.io/primekey/ejbca-ce:6.15.2.3 whoami
whoami: cannot find name for user ID 10001

# Not present in /etc/passwd
$ docker run docker.io/primekey/ejbca-ce:6.15.2.3 cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
Pewit answered 17/11, 2020 at 9:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.