ssh-agent does not remember identities when running inside a docker container in DC/OS

Asked 10/8, 2017 at 5:13 Answered 6/9, 2017 at 1:18

I am trying to run a service using DC/OS and Docker. I created my Stack using the template for my region from here. I also created the following Dockerfile:

FROM ubuntu:16.04
RUN apt-get update && apt-get install -y expect openssh-client

WORKDIR "/root"
ENTRYPOINT eval "$(ssh-agent -s)" && \
           mkdir -p .ssh && \
           echo $PRIVATE_KEY > .ssh/id_rsa && \
           chmod 600 /root/.ssh/id_rsa && \
           expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\" send \"\"; interact " && \
           while true; do ssh-add -l; sleep 2; done

I have a private repository that I would like to clone/pull from when the docker container starts. This is why I am trying to add the private key to the ssh-agent.

If I run this image as a docker container locally and supply the private key using the PRIVATE_KEY environment variable, everything works fine. I see that the identity is added.

The problem that I have is that when I try to run a service on DC/OS using the docker image, the ssh-agent does not seem to remember the identity that was added using the private key.

I have checked the error log from DC/OS. There are no errors.

Does anyone know why running the docker container on DC/OS is any different compared to running it locally?

EDIT: I have added details of the description of the DC/OS service in case it helps:

{
 "id": "/SOME-ID",
 "instances": 1,
  "cpus": 1,
  "mem": 128,
  "disk": 0,
  "gpus": 0,
  "constraints": [],
  "fetch": [],
  "storeUrls": [],
  "backoffSeconds": 1,
  "backoffFactor": 1.15,
  "maxLaunchDelaySeconds": 3600,
  "container": {
                "type": "DOCKER",
                "volumes": [],
                "docker": {
                "image": "IMAGE NAME FROM DOCKERHUB",
                "network": "BRIDGE",
                "portMappings": [{
                                  "containerPort": SOME PORT NUMBER,
                                  "hostPort": SOME PORT NUMBER,
                                  "servicePort": SERVICE PORT NUMBER,
                                  "protocol": "tcp",
                                  "name": “default”
                                 }],
                "privileged": false,
                "parameters": [],
                "forcePullImage": true
               }
  },
  "healthChecks": [],
  "readinessChecks": [],
  "dependencies": [],
  "upgradeStrategy": {
                      "minimumHealthCapacity": 1,
                      "maximumOverCapacity": 1
                     },
  "unreachableStrategy": {
                          "inactiveAfterSeconds": 300,
                          "expungeAfterSeconds": 600
                         },
  "killSelection": "YOUNGEST_FIRST",
  "requirePorts": true,
  "env": {
          "PRIVATE_KEY": "ID_RSA PRIVATE_KEY WITH \n LINE BREAKS",
         }
  }

Pipes answered 10/8, 2017 at 5:13 Comment(9)

what output you get from the above code? – Ineffective 11/8, 2017 at 8:58

The agent has no identities. – Pipes 11/8, 2017 at 18:25

Not sure it I understand... How should this work? Either you need to include the key in the Docker image and push it to the registry, before using it on DC/OS, or you need to use the env var as well (which will only get persisted in the running container, as you don't use any volumes etc.) – Altogether 13/8, 2017 at 10:24

That means that you did not manage to add the key to the agent for some reason. Investigate deeper which commands were actually ran and which commands were not. – Ineffective 13/8, 2017 at 14:10

@Tobi, that is why I am using the environmental variable $PRIVATE_KEY. We do not want to store any keys inside the docker image since we store our images publicly. – Pipes 14/8, 2017 at 19:24

@Ineffective if any of the six lines of code in the ENTRYPOINT were not actually being run, the container would not have produced the correct output when I ran it locally. The issue only appears when containers are run in DC/OS. – Pipes 15/8, 2017 at 1:1

Have you considered using secrets like described at docs.mesosphere.com/1.9/security/secrets ? Edit: it only seems to be available for Enterprise DC/OS. – Synthetic 18/8, 2017 at 21:57

Did you push the latest image? – Microdont 22/8, 2017 at 14:23

Does the Docker version of your DC/OS cluster match the Docker version you are using for local testing? Which version is it? – Tu 6/9, 2017 at 0:19

Docker Version

Check that your local version of Docker matches the version installed on the DC/OS agents. By default, the DC/OS 1.9.3 AWS CloudFormation templates uses CoreOS 1235.12.0, which comes with Docker 1.12.6. It's possible that the entrypoint behavior has changed since then.

Docker Command

Check the Mesos task logs for the Marathon app in question and see what docker run command was executed. You might be passing it slightly different arguments when testing locally.

Script Errors

As mentioned in another answer, the script you provided has several errors that may or may not be related to the failure.

echo $PRIVATE_KEY should be echo "$PRIVATE_KEY" to preserve line breaks. Otherwise key decryption will fail with Bad passphrase, try again for /root/.ssh/id_rsa:.
expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\" send \"\"; interact " should be expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\"; send \"\n\"; interact ". It's missing a semi-colon and a line break. Otherwise the expect command fails without executing.

File Based Secrets

Enterprise DC/OS 1.10 (1.10.0-rc1 out now) has a new feature named File Based Secrets which allows for injecting files (like id_rsa files) without including their contents in the Marathon app definition, storing them securely in Vault using DC/OS Secrets.

File based secrets wont do the ssh-add for you, but it should make it easier and more secure to get the file into the container.

Mesos Bug

Mesos 1.2.0 switched to using Docker --env_file instead of -e to pass in environment variables. This triggers a Docker env_file bug that it doesn't support line breaks. A workaround was put into Mesos and DC/OS, but the fix may not be in the minor version you are using.

A manual workaround is to convert the rsa_id to base64 for the Marathon definition and back in your entrypoint script.

Tu answered 6/9, 2017 at 1:18 Comment(2)

This answer has been accepted - I'd be interested in the actual detail that triggered the error. Can you explain which aspect fixed your issue @siavashk? – Synthetic 11/9, 2017 at 19:44

Docker versions. – Pipes 11/9, 2017 at 21:56

The key file contents being passed via PRIVATE_KEY originally contain line breaks. After echoing the PRIVATE_KEY variable content to ~/.ssh/id_rsa the line breaks will be gone. You can fix that issue by wrapping the $PRIVATE_KEY variable with double quotes.

Another issue arises when the container is started without attached TTY, typically via -i -t command line parameters to docker run. The password request will fail and won't add the ssh key to the ssh-agent. For the container being run in DC/OS, the interaction probably won't make sense, so you should change your entrypoint script accordingly. That will require your ssh key to be passwordless.

This changed Dockerfile should work:

ENTRYPOINT eval "$(ssh-agent -s)" && \
           mkdir -p .ssh && \
           echo "$PRIVATE_KEY" > .ssh/id_rsa && \
           chmod 600 /root/.ssh/id_rsa && \
           ssh-add /root/.ssh/id_rsa && \
           while true; do ssh-add -l; sleep 2; done

Synthetic answered 17/8, 2017 at 22:11 Comment(10)

This does not explain why the issue only appears when the docker container is run on DC/OS. – Pipes 18/8, 2017 at 18:36

I don't have enough details about the difference between your local and the DC/OS environments. I would assume that you have either different id_rsa formats or another shell when passing the PRIVATE_KEY env. I could reproduce your issue locally on Mac OS, so I think it's not DC/OS specific. – Synthetic 18/8, 2017 at 18:45

Did the double quotes even fix the problem? Maybe they are only working for me as a fix? – Synthetic 18/8, 2017 at 18:51

No it did not fix it. I was also escaping new lines in the private key using \n, so line breaks where not an actual issue. I did actually list the configuration of the DC/OS on cloud formation in the original question. You can find it here As for the json file for the DC/OS service, see my edits – Pipes 18/8, 2017 at 19:57

Can you/did you verify whether the env variable inside the container is the expected private key with correct line breaks? A simple cat .ssh/id_rsa would show the actual result ssh-add and ssh-agent would "see". I'm still not done with the line breaks ;-) – Synthetic 18/8, 2017 at 21:48

Yes, I verified the obvious. – Pipes 18/8, 2017 at 22:30

Let us continue this discussion in chat. – Synthetic 19/8, 2017 at 10:10

One difference between local and DC/OS is probably an attached TTY (typically enabled with -i -t). Without interactive TTY my local tests also fail with the error message "The agent has no identities". – Synthetic 19/8, 2017 at 10:31

I've updated my answer to also consider the interactive mode/attached TTY. – Synthetic 19/8, 2017 at 22:28

@Pipes did you check the non-interactive variant of the entrypoint script? – Synthetic 21/8, 2017 at 21:4

Docker Version

Docker Command

Check the Mesos task logs for the Marathon app in question and see what docker run command was executed. You might be passing it slightly different arguments when testing locally.

Script Errors

As mentioned in another answer, the script you provided has several errors that may or may not be related to the failure.

echo $PRIVATE_KEY should be echo "$PRIVATE_KEY" to preserve line breaks. Otherwise key decryption will fail with Bad passphrase, try again for /root/.ssh/id_rsa:.
expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\" send \"\"; interact " should be expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\"; send \"\n\"; interact ". It's missing a semi-colon and a line break. Otherwise the expect command fails without executing.

File Based Secrets

File based secrets wont do the ssh-add for you, but it should make it easier and more secure to get the file into the container.

Mesos Bug

A manual workaround is to convert the rsa_id to base64 for the Marathon definition and back in your entrypoint script.

Tu answered 6/9, 2017 at 1:18 Comment(2)

This answer has been accepted - I'd be interested in the actual detail that triggered the error. Can you explain which aspect fixed your issue @siavashk? – Synthetic 11/9, 2017 at 19:44

Docker versions. – Pipes 11/9, 2017 at 21:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Docker Version

Docker Command

Script Errors

File Based Secrets

Mesos Bug

Docker Version

Docker Command

Script Errors

File Based Secrets

Mesos Bug

Recommended topics

Hot tags