How to migrate Docker volume between hosts?
Asked Answered
V

3

18

Docker's documentation states that volumes can be "migrated" - which I'm assuming means that I should be able to move a volume from one host to another host. (More than happy to be corrected on this point.) However, the same documentation page doesn't provide information on how to do this.

Digging around on SO, I have found an older question (circa 2015-ish) that states that this is not possible, but given that it's 2 years on, I thought I'd ask again.

In case it helps, I'm developing a Flask app that uses [TinyDB] + local disk as its data storage - I have determined that I didn't need anything more fancy than that; this is a project done for learning at the moment, so I've decided to go extremely lightweight. The project is structured as such:

/project_directory
|- /app
   |- __init__.py
   |- ...
|- run.py  # assumes `data/databases/ and data/files/` are present
|- Dockerfile
|- data/
   |- databases/
      |- db1.json
      |- db2.json
   |- files/
      |- file1.pdf
      |- file2.pdf

I have the folder data/* inside my .dockerignore and .gitignore, so that they are not placed under version control and are ignored by Docker when building the images.

While developing the app, I am also trying to work with database entries and PDFs that are as close to real-world as possible, so I seeded the app with a very small subset of real data, that are stored on a volume that is mounted directly into data/ when the Docker container is instantiated.

What I want to do is deploy the container on a remote host, but have the remote host seeded with the starter data (ideally, this would be the volume that I've been using locally, for maximal convenience); later on as more data are added on the remote host, I'd like to be able to pull that back down so that during development I'm working with up-to-date data that my end users have entered.

Looking around, the "hacky" way I'm thinking of doing is simply using rsync, which might work out just fine. However, if there's a solution I'm missing, I'd greatly appreciate guidance!

Viscacha answered 16/8, 2017 at 13:8 Comment(2)
This might be useful guidodiepen.nl/2016/05/…Antibiosis
For completeness, there exists an extension for Docker Desktop to backup and share volumes. Presented here: docker.com/blog/…Defeasance
U
9

The way I would approach this is to generate a Docker container that stores a copy of the data you want to seed your development environment with. You can then expose the data in that container as a volume, and finally mount that volume into your development containers. I'll demonstrate with an example:

Creating the Data Container

Firstly we're just going to create a Docker container that contains your seed data and nothing else. I'd create a Dockerfile at ~/data/Dockerfile and give it the following content:

FROM alpine:3.4
ADD . /data
VOLUME /data
CMD /bin/true

You could then build this with:

docker build -t myproject/my-seed-data .

This will create you a Docker image tagged as myproject/my-seed-data:latest. The image simply contains all of the data you want to seed the environment with, stored at /data within the image. Whenever we create an instance of the image as a container, it will expose all of the files within /data as a volume.

Mounting the volume into another Docker container

I imagine you're running your Docker container something like this:

docker run -d -v $(pwd)/data:/data your-container-image <start_up_command>

You could now extend that to do the following:

docker run -d --name seed-data myproject/my-seed-data
docker run -d --volumes-from seed-data your-container-image <start_up_command>

What we're doing here is first creating an instance of your seed data container. We're then creating an instance of the development container and mounting the volumes from the data container into it. This means that you'll get the seed data at /data within your development container.

This gets a little bit of a pain that you know need to run two commands, so we could go ahead and orchestrate it a bit better with something like Docker Compose

Simple Orchestration with Docker Compose

Docker Compose is a way of running more than one container at the same time. You can declare what your environment needs to look like and do things like define:

"My development container depends on an instance of my seed data container"

You create a docker-compose.yml file to layout what you need. It would look something like this:

version: 2
services:
  seed-data:
   image: myproject/my-seed-data:latest

  my_app:
    build: .
    volumes_from:
     - seed-data
    depends_on:
     - seed-data

You can then start all containers at once using docker-compose up -d my_app. Docker Compose is smart enough to firstly start an instance of your data container, and then finally your app container.

Sharing the Data Container between hosts

The easiest way to do this is to push your data container as an image to Docker Hub. Once you have built the image, it can be pushed to Docker Hub as follows:

docker push myproject/my-seed-data:latest

It's very similar in concept to pushing a Git commit to a remote repository, instead in this case you're pushing a Docker image. What this does mean however is that any environment can now pull this image and use the data contained within it. That means you can re-generate the data image when you have new seed data, push it to Docker Hub under the :latest tag and when you re-start your dev environment will have the latest data.

To me this is the "Docker" way of sharing data and it keeps things portable between Docker environments. You can also do things like have your data container generated on a regular basis by a job within a CI environment like Jenkins.

Urea answered 16/8, 2017 at 14:0 Comment(2)
a quick follow-on question, if you'd be kind enough to help: say my users on my remote host have added in new data, and I'd like to pull that back down locally without doing docker commits (I hear it gets ugly really fast), is there a way to do this?Viscacha
@Viscacha Well you could for example docker cp the data out of a container, build another Data container based upon that data, push the new data container to Docker Hub and then docker pull it locally. The above could be scripted on a daily basis and managed by a CI server e.g. Jenkins. Make sense?Urea
E
15

According the Docker docs you could also create a Backup and Restore it:

Backup volume

docker run --rm --volumes-from CONTAINER -v \
$(pwd):/backup ubuntu tar cvf /backup/backup.tar /MOUNT_POINT_OF_VOLUME

Restore volume from backup on another host

docker run --rm --volumes-from CONTAINER -v \
$(pwd):/LOCAL_FOLDER ubuntu bash -c "cd /MOUNT_POINT_OF_VOLUME && \
tar xvf /backup/backup.tar --strip 1"

OR (what I prefer) just copy it to local storage

docker cp --archive CONTAINER:/MOUNT_POINT_OF_VOLUME ./LOCAL_FOLDER

then copy it to the other host and start with e.g.

docker run -v ./LOCAL_FOLDER:/MOUNT_POINT_OF_VOLUME some_image
Enchilada answered 23/9, 2021 at 0:49 Comment(0)
U
9

The way I would approach this is to generate a Docker container that stores a copy of the data you want to seed your development environment with. You can then expose the data in that container as a volume, and finally mount that volume into your development containers. I'll demonstrate with an example:

Creating the Data Container

Firstly we're just going to create a Docker container that contains your seed data and nothing else. I'd create a Dockerfile at ~/data/Dockerfile and give it the following content:

FROM alpine:3.4
ADD . /data
VOLUME /data
CMD /bin/true

You could then build this with:

docker build -t myproject/my-seed-data .

This will create you a Docker image tagged as myproject/my-seed-data:latest. The image simply contains all of the data you want to seed the environment with, stored at /data within the image. Whenever we create an instance of the image as a container, it will expose all of the files within /data as a volume.

Mounting the volume into another Docker container

I imagine you're running your Docker container something like this:

docker run -d -v $(pwd)/data:/data your-container-image <start_up_command>

You could now extend that to do the following:

docker run -d --name seed-data myproject/my-seed-data
docker run -d --volumes-from seed-data your-container-image <start_up_command>

What we're doing here is first creating an instance of your seed data container. We're then creating an instance of the development container and mounting the volumes from the data container into it. This means that you'll get the seed data at /data within your development container.

This gets a little bit of a pain that you know need to run two commands, so we could go ahead and orchestrate it a bit better with something like Docker Compose

Simple Orchestration with Docker Compose

Docker Compose is a way of running more than one container at the same time. You can declare what your environment needs to look like and do things like define:

"My development container depends on an instance of my seed data container"

You create a docker-compose.yml file to layout what you need. It would look something like this:

version: 2
services:
  seed-data:
   image: myproject/my-seed-data:latest

  my_app:
    build: .
    volumes_from:
     - seed-data
    depends_on:
     - seed-data

You can then start all containers at once using docker-compose up -d my_app. Docker Compose is smart enough to firstly start an instance of your data container, and then finally your app container.

Sharing the Data Container between hosts

The easiest way to do this is to push your data container as an image to Docker Hub. Once you have built the image, it can be pushed to Docker Hub as follows:

docker push myproject/my-seed-data:latest

It's very similar in concept to pushing a Git commit to a remote repository, instead in this case you're pushing a Docker image. What this does mean however is that any environment can now pull this image and use the data contained within it. That means you can re-generate the data image when you have new seed data, push it to Docker Hub under the :latest tag and when you re-start your dev environment will have the latest data.

To me this is the "Docker" way of sharing data and it keeps things portable between Docker environments. You can also do things like have your data container generated on a regular basis by a job within a CI environment like Jenkins.

Urea answered 16/8, 2017 at 14:0 Comment(2)
a quick follow-on question, if you'd be kind enough to help: say my users on my remote host have added in new data, and I'd like to pull that back down locally without doing docker commits (I hear it gets ugly really fast), is there a way to do this?Viscacha
@Viscacha Well you could for example docker cp the data out of a container, build another Data container based upon that data, push the new data container to Docker Hub and then docker pull it locally. The above could be scripted on a daily basis and managed by a CI server e.g. Jenkins. Make sense?Urea
E
5

you can use this trick :

docker run --rm -v <SOURCE_DATA_VOLUME_NAME>:/from alpine ash -c "cd /from ; tar -cf - . " | ssh <TARGET_HOST> 'docker run --rm -i -v <TARGET_DATA_VOLUME_NAME>:/to alpine ash -c "cd /to ; tar -xpvf - " '

more information

Echolocation answered 4/8, 2020 at 7:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.