I'm using Google's AI Platform to train machine learning models using a custom Docker image. To run existing code without modifications, I would like to mount a GCS bucket inside the container.
I think one way to achieve this is to install gcloud
to authentication and gcsfuse
for mounting in the container. My Dockerfile looks like this:
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
WORKDIR /root
# Install system packages.
RUN apt-get update
RUN apt-get install -y curl
# ...
# Install gcsfuse.
RUN echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" | tee /etc/apt/sources.list.d/gcsfuse.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
RUN apt-get update
RUN apt-get install -y gcsfuse
# Install gcloud.
RUN apt-get install -y apt-transport-https
RUN apt-get install -y ca-certificates
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
RUN apt-get update
RUN apt-get install -y google-cloud-sdk
# ...
ENTRYPOINT ["entrypoint.sh"]
Inside the entry point script, I then try to authenticate with Google cloud and mount the bucket. My entrypoint.sh
looks like this:
#!/bin/sh
set -e
gcloud auth login
gcsfuse my-bucket-name /root/output
python3 script.py --logdir /root/output/experiment
I then build the container and run it either locally for testing or remotely on the AI Platform for the full training run:
# Run locally for testing.
nvidia-docker build -t my-image-name .
nvidia-docker run -it --rm my-image-name
# Run on AI Platform for full training run.
nvidia-docker build -t my-image-name .
gcloud auth configure-docker
nvidia-docker push my-image-name
gcloud beta ai-platform jobs submit training --region us-west1 --scale-tier custom --master-machine-type standard_p100 --master-image-uri my-image-name
Both locally and on the AI Platform, the entrypoint.sh
script hangs at the line gcloud auth login
, probably because it waits for user input. Is there a better way of authenticating with Google Cloud from within the container? If not, how can I automate the line that currently hangs?