Understanding "VOLUME" instruction in DockerFile
Asked Answered
R

8

324

Below is the content of my "Dockerfile"

FROM node:boron

# Create app directory
RUN mkdir -p /usr/src/app

# Change working dir to /usr/src/app
WORKDIR /usr/src/app

VOLUME . /usr/src/app

RUN npm install

EXPOSE 8080

CMD ["node" , "server" ]

In this file I am expecting VOLUME . /usr/src/app instruction to mount contents of present working directory in host to be mounted on /usr/src/app folder of container.

Please let me know if this is the correct way?

Rag answered 30/1, 2017 at 12:1 Comment(1)
tldr: docker run -it -v /host/path:/container/path containername bash and VOLUME inside Dockerfile not needed.Decided
R
184

The official docker tutorial says:

A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:

  • Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point,
    that existing data is copied into the new volume upon volume
    initialization. (Note that this does not apply when mounting a host
    directory.)

  • Data volumes can be shared and reused among containers.

  • Changes to a data volume are made directly.

  • Changes to a data volume will not be included when you update an image.

  • Data volumes persist even if the container itself is deleted.

In Dockerfile you can specify only the destination of a volume inside a container. e.g. /usr/src/app.

When you run a container, e.g. docker run --volume=/opt:/usr/src/app my_image, you may but do not have to specify its mounting point (/opt) on the host machine. If you do not specify --volume argument then the mount point will be chosen automatically, usually under /var/lib/docker/volumes/.

Rattlebrain answered 30/1, 2017 at 12:15 Comment(4)
An awesome explanation, in video, on how storage works in DockerHeron
In Dockerfile you can specify only the destination of a volume inside a container. e.g. /usr/src/app. this is the exact line I needed to read to stop investing more time on this solutionFlatiron
Sorry for reviving this old thread. In the Docker docs on Volumes it reads: "-v or --volume: [...] The second field is the path where the file or directory are mounted in the container." ... According to your explanation, shouldn't this rather read "mounted in the host"?Kitchen
This is referring to Dockerfile, you're referring to -v in docker CLI. The CLI syntax is -v source_path_on_localohst:destination_path_in_container, the Dockerfile directive VOLUME is instead VOLUME destination_path_in_container (and then you specify the source_path_on_localohst with docker run)Immortelle
A
563

In short: No, your VOLUME instruction is not correct.

Dockerfile's VOLUME specify one or more volumes given container-side paths. But it does not allow the image author to specify a host path. On the host-side, the volumes are created with a very long ID-like name inside the Docker root. On my machine this is /var/lib/docker/volumes.

Note: Because the autogenerated name is extremely long and makes no sense from a human's perspective, these volumes are often referred to as "unnamed" or "anonymous".

Your example that uses a '.' character will not even run on my machine, no matter if I make the dot the first or second argument. I get this error message:

docker: Error response from daemon: oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused "open /dev/ptmx: no such file or directory"".

I know that what has been said to this point is probably not very valuable to someone trying to understand VOLUME and -v and it certainly does not provide a solution for what you try to accomplish. So, hopefully, the following examples will shed some more light on these issues.

Minitutorial: Specifying volumes

Given this Dockerfile:

FROM openjdk:8u131-jdk-alpine
VOLUME vol1 vol2

(For the outcome of this minitutorial, it makes no difference if we specify vol1 vol2 or /vol1 /vol2 — this is because the default working directory within a Dockerfile is /)

Build it:

docker build -t my-openjdk

Run:

docker run --rm -it my-openjdk

Inside the container, run ls in the command line and you'll notice two directories exist; /vol1 and /vol2.

Running the container also creates two directories, or "volumes", on the host-side.

While having the container running, execute docker volume ls on the host machine and you'll see something like this (I have replaced the middle part of the name with three dots for brevity):

DRIVER    VOLUME NAME
local     c984...e4fc
local     f670...49f0

Back in the container, execute touch /vol1/weird-ass-file (creates a blank file at said location).

This file is now available on the host machine, in one of the unnamed volumes lol. It took me two tries because I first tried the first listed volume, but eventually I did find my file in the second listed volume, using this command on the host machine:

sudo ls /var/lib/docker/volumes/f670...49f0/_data

Similarly, you can try to delete this file on the host and it will be deleted in the container as well.

Note: The _data folder is also referred to as a "mount point".

Exit out from the container and list the volumes on the host. They are gone. We used the --rm flag when running the container and this option effectively wipes out not just the container on exit, but also the volumes.

Run a new container, but specify a volume using -v:

docker run --rm -it -v /vol3 my-openjdk

This adds a third volume and the whole system ends up having three unnamed volumes. The command would have crashed had we specified only -v vol3. The argument must be an absolute path inside the container. On the host-side, the new third volume is anonymous and resides together with the other two volumes in /var/lib/docker/volumes/.

It was stated earlier that the Dockerfile can not map to a host path which sort of pose a problem for us when trying to bring files in from the host to the container during runtime. A different -v syntax solves this problem.

Imagine I have a subfolder in my project directory ./src that I wish to sync to /src inside the container. This command does the trick:

docker run -it -v $(pwd)/src:/src my-openjdk

Both sides of the : character expects an absolute path. Left side being an absolute path on the host machine, right side being an absolute path inside the container. pwd is a command that "print current/working directory". Putting the command in $() takes the command within parenthesis, runs it in a subshell and yields back the absolute path to our project directory.

Putting it all together, assume we have ./src/Hello.java in our project folder on the host machine with the following contents:

public class Hello {
    public static void main(String... ignored) {
        System.out.println("Hello, World!");
    }
}

We build this Dockerfile:

FROM openjdk:8u131-jdk-alpine
WORKDIR /src
ENTRYPOINT javac Hello.java && java Hello

We run this command:

docker run -v $(pwd)/src:/src my-openjdk

This prints "Hello, World!".

The best part is that we're completely free to modify the .java file with a new message for another output on a second run - without having to rebuild the image =)

Final remarks

I am quite new to Docker, and the aforementioned "tutorial" reflects information I gathered from a 3-day command line hackathon. I am almost ashamed I haven't been able to provide links to clear English-like documentation backing up my statements, but I honestly think this is due to a lack of documentation and not personal effort. I do know the examples work as advertised using my current setup which is "Windows 10 -> Vagrant 2.0.0 -> Docker 17.09.0-ce".

The tutorial does not solve the problem "how do we specify the container's path in the Dockerfile and let the run command only specify the host path". There might be a way, I just haven't found it.

Finally, I have a gut feeling that specifying VOLUME in the Dockerfile is not just uncommon, but it's probably a best practice to never use VOLUME. For two reasons. The first reason we have already identified: We can not specify the host path - which is a good thing because Dockerfiles should be very agnostic to the specifics of a host machine. But the second reason is people might forget to use the --rm option when running the container. One might remember to remove the container but forget to remove the volume. Plus, even with the best of human memory, it might be a daunting task to figure out which of all anonymous volumes are safe to remove.

Archean answered 28/10, 2017 at 17:11 Comment(14)
When should we use unnamed/anonymous volumes?Equation
@Martin thank you very much. Your hackathon and its resulting tutorial here is very much appreicated.Causality
"I haven't been able to provide links to clear English-like documentation ... I honestly think this is due to a lack of documentation". I can confirm. This is the most thorough and up-to-date documentation I've found and I've been looking for hours.Pulpboard
I get that if to call docker run w/o --rm, the host machine has these unnamed volumes left over there. Then if to make another call docker run w/o --rm again, the host machine will have new unnamed volumes, right? Eventually, the host machine will get blow up, right?Pythagoras
lol yeah that's my bet, the machine will completely blow up over time. But, I can't say for sure cos I haven't tested it. Maybe the Docker guys wrote some hacks into the code in order to purge unused volumes at some point in time? I dunno. Wouldn't count on it lolArchean
docker volume prune can be used to clean up leftover volumes that aren't attached to running containers. Not to say that it'll be easy to distingiush potentially important ones by id alone...Haze
I am newly learning about the docker (specially, the volume etc) and I find the answer very helpful.Pochard
"For the outcome of this minitutorial, it makes no difference if we specify vol1 vol2 or /vol1 /vol2 - don't ask me why". @MartinAndersson that's because the current working directory is /, so vol1 is relative to /, which resolves to /vol1. If you use WORKDIR to specify a working directory other than /, vol1 and /vol1 no longer would point to the same directory.Zoellick
That tutorial-style is great, but it does not explain the exact details, that are better explained with the other (now downrated IMHO) answers. It does not mention the performance and copy-on-write behavior, that seems to be crucial here.Wriggler
All make sense but the volume created on docker host should not be deleted with the container deleted. Because it kills the purpose of specifying volume. I mean why would you specify VOLUME in Dockerfile?Sperrylite
Please note the build command needs to specify the path to the Docker context, or just a dot at the end: docker build -t my-openjdk .Jobholder
The best mini tutorial, clear and concise. Thank you!Manners
Very conscientious, the world needs more people like you!Trocki
This should seriously get into the docsDespain
R
184

The official docker tutorial says:

A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:

  • Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point,
    that existing data is copied into the new volume upon volume
    initialization. (Note that this does not apply when mounting a host
    directory.)

  • Data volumes can be shared and reused among containers.

  • Changes to a data volume are made directly.

  • Changes to a data volume will not be included when you update an image.

  • Data volumes persist even if the container itself is deleted.

In Dockerfile you can specify only the destination of a volume inside a container. e.g. /usr/src/app.

When you run a container, e.g. docker run --volume=/opt:/usr/src/app my_image, you may but do not have to specify its mounting point (/opt) on the host machine. If you do not specify --volume argument then the mount point will be chosen automatically, usually under /var/lib/docker/volumes/.

Rattlebrain answered 30/1, 2017 at 12:15 Comment(4)
An awesome explanation, in video, on how storage works in DockerHeron
In Dockerfile you can specify only the destination of a volume inside a container. e.g. /usr/src/app. this is the exact line I needed to read to stop investing more time on this solutionFlatiron
Sorry for reviving this old thread. In the Docker docs on Volumes it reads: "-v or --volume: [...] The second field is the path where the file or directory are mounted in the container." ... According to your explanation, shouldn't this rather read "mounted in the host"?Kitchen
This is referring to Dockerfile, you're referring to -v in docker CLI. The CLI syntax is -v source_path_on_localohst:destination_path_in_container, the Dockerfile directive VOLUME is instead VOLUME destination_path_in_container (and then you specify the source_path_on_localohst with docker run)Immortelle
R
110

Specifying a VOLUME line in a Dockerfile configures a bit of metadata on your image, but how that metadata is used is important.

First, what did these two lines do:

WORKDIR /usr/src/app
VOLUME . /usr/src/app

The WORKDIR line there creates the directory if it doesn't exist, and updates some image metadata to specify all relative paths, along with the current directory for commands like RUN will be in that location. The VOLUME line there specifies two volumes, one is the relative path ., and the other is /usr/src/app, both just happen to be the same directory. Most often the VOLUME line only contains a single directory, but it can contain multiple as you've done, or it can be a json formatted array.

You cannot specify a volume source in the Dockerfile: A common source of confusion when specifying volumes in a Dockerfile is trying to match the runtime syntax of a source and destination at image build time, this will not work. The Dockerfile can only specify the destination of the volume. It would be a trivial security exploit if someone could define the source of a volume since they could update a common image on the docker hub to mount the root directory into the container and then launch a background process inside the container as part of an entrypoint that adds logins to /etc/passwd, configures systemd to launch a bitcoin miner on next reboot, or searches the filesystem for credit cards, SSNs, and private keys to send off to a remote site.

What does the VOLUME line do? As mentioned, it sets some image metadata to say a directory inside the image is a volume. How is this metadata used? Every time you create a container from this image, docker will force that directory to be a volume. If you do not provide a volume in your run command, or compose file, the only option for docker is to create an anonymous volume. This is a local named volume with a long unique id for the name and no other indication for why it was created or what data it contains (anonymous volumes are were data goes to get lost). If you override the volume, pointing to a named or host volume, your data will go there instead.

VOLUME breaks things: You cannot disable a volume once defined in a Dockerfile. And more importantly, the RUN command in docker is implemented with temporary containers with the classic builder. Those temporary containers will get a temporary anonymous volume. That anonymous volume will be initialized with the contents of your image. Any writes inside the container from your RUN command will be made to that volume. When the RUN command finishes, changes to the image are saved, and changes to the anonymous volume are discarded. Because of this, I strongly recommend against defining a VOLUME inside the Dockerfile. It results in unexpected behavior for downstream users of your image that wish to extend the image with initial data in volume location.

How should you specify a volume? To specify where you want to include volumes with your image, provide a docker-compose.yml. Users can modify that to adjust the volume location to their local environment, and it captures other runtime settings like publishing ports and networking.

Someone should document this! They have. Docker includes warnings on the VOLUME usage in their documentation on the Dockerfile along with advice to specify the source at runtime:

  • Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.

...

  • The host directory is declared at container run-time: The host directory (the mountpoint) is, by its nature, host-dependent. This is to preserve image portability, since a given host directory can’t be guaranteed to be available on all hosts. For this reason, you can’t mount a host directory from within the Dockerfile. The VOLUME instruction does not support specifying a host-dir parameter. You must specify the mountpoint when you create or run the container.

The behavior of defining a VOLUME followed by RUN steps in a Dockerfile has changed with the introduction of buildkit. Here are two examples. First the Dockerfile:

$ cat df.vol-run 
FROM busybox

WORKDIR /test
VOLUME /test
RUN echo "hello" >/test/hello.txt \
 && chown -R nobody:nobody /test

Next, building without buildkit. Note how the changes from the RUN step are lost:

$ DOCKER_BUILDKIT=0 docker build -t test-vol-run -f df.vol-run .
Sending build context to Docker daemon  23.04kB
Step 1/4 : FROM busybox
 ---> beae173ccac6
Step 2/4 : WORKDIR /test
 ---> Running in aaf2c2920ebd
Removing intermediate container aaf2c2920ebd
 ---> 7960bec5b546
Step 3/4 : VOLUME /test
 ---> Running in 9e2fbe3e594b
Removing intermediate container 9e2fbe3e594b
 ---> 5895ddaede1f
Step 4/4 : RUN echo "hello" >/test/hello.txt  && chown -R nobody:nobody /test
 ---> Running in 2c6adff98c70
Removing intermediate container 2c6adff98c70
 ---> ef2c30f207b6
Successfully built ef2c30f207b6
Successfully tagged test-vol-run:latest

$ docker run -it test-vol-run /bin/sh
/test # ls -al 
total 8
drwxr-xr-x    2 root     root          4096 Mar  6 14:35 .
drwxr-xr-x    1 root     root          4096 Mar  6 14:35 ..
/test # exit

And then building with buildkit. Note how the changes from the RUN step are preserved:

$ docker build -t test-vol-run -f df.vol-run .
[+] Building 0.5s (7/7) FINISHED                                                                         
 => [internal] load build definition from df.vol-run                                                0.0s
 => => transferring dockerfile: 154B                                                                0.0s
 => [internal] load .dockerignore                                                                   0.0s
 => => transferring context: 34B                                                                    0.0s
 => [internal] load metadata for docker.io/library/busybox:latest                                   0.0s
 => CACHED [1/3] FROM docker.io/library/busybox                                                     0.0s
 => [2/3] WORKDIR /test                                                                             0.0s
 => [3/3] RUN echo "hello" >/test/hello.txt  && chown -R nobody:nobody /test                        0.4s
 => exporting to image                                                                              0.0s
 => => exporting layers                                                                             0.0s
 => => writing image sha256:8cb3220e3593b033778f47e7a3cb7581235e4c6fa921c5d8ce1ab329ebd446b6        0.0s
 => => naming to docker.io/library/test-vol-run                                                     0.0s

$ docker run -it test-vol-run /bin/sh
/test # ls -al
total 12
drwxr-xr-x    2 nobody   nobody        4096 Mar  6 14:34 .
drwxr-xr-x    1 root     root          4096 Mar  6 14:34 ..
-rw-r--r--    1 nobody   nobody           6 Mar  6 14:34 hello.txt
/test # exit
Recce answered 4/4, 2019 at 12:49 Comment(0)
R
73

To better understand the volume instruction in dockerfile, let us learn the typical volume usage in mysql official docker file implementation.

VOLUME /var/lib/mysql

Reference: https://github.com/docker-library/mysql/blob/3362baccb4352bcf0022014f67c1ec7e6808b8c5/8.0/Dockerfile

The /var/lib/mysql is the default location of MySQL that store data files.

When you run test container for test purpose only, you may not specify its mounting point,e.g.

docker run mysql:8

then the mysql container instance will use the default mount path which is specified by the volume instruction in dockerfile. the volumes is created with a very long ID-like name inside the Docker root, this is called "unnamed" or "anonymous" volume. In the folder of underlying host system /var/lib/docker/volumes.

/var/lib/docker/volumes/320752e0e70d1590e905b02d484c22689e69adcbd764a69e39b17bc330b984e4

This is very convenient for quick test purposes without the need to specify the mounting point, but still can get best performance by using Volume for data store, not the container layer.

For a formal use, you will need to specify the mount path by using named volume or bind mount, e.g.

docker run  -v /my/own/datadir:/var/lib/mysql mysql:8

The command mounts the /my/own/datadir directory from the underlying host system as /var/lib/mysql inside the container.The data directory /my/own/datadir won't be automatically deleted, even the container is deleted.

Usage of the mysql official image (Please check the "Where to Store Data" section):

Reference: https://hub.docker.com/_/mysql/

Rideout answered 14/6, 2019 at 13:28 Comment(6)
I very like your explanation.Carlie
But docker saves the changes anyway. Also you can set mount path -v use it without setting the volume in the DockerfileModernistic
This is the best answer, that made it completely clear. Crisp and to the point with examples.Tootle
An example! how refreshing and clear :) (seriously)Josephinajosephine
So many tutorials and your explanation made a lot of sense ... you are a legend ...Courteous
Excellent answer - thank you. Is the ID of the volume bound to the container? If so (and taking your example) what happens when I want in dev mode to start from scratch? Would removing the MySQL container also remove the associated volume? Would a new image (and therefore a new container) create a new volume and I would need to move the data from th older one (all this is dev mode - for prod I would go for an explicit mount)Despain
P
64

The VOLUME command in a Dockerfile is quite legit, totally conventional, absolutely fine to use and it is not deprecated in anyway. Just need to understand it.

We use it to point to any directories which the app in the container will write to a lot. We don't use VOLUME just because we want to share between host and container like a config file.

The command simply needs one param; a path to a folder, relative to WORKDIR if set, from within the container. Then docker will create a volume in its graph(/var/lib/docker) and mount it to the folder in the container. Now the container will have somewhere to write to with high performance. Without the VOLUME command the write speed to the specified folder will be very slow because now the container is using it's copy on write strategy in the container itself. The copy on write strategy is a main reason why volumes exist.

If you mount over the folder specified by the VOLUME command, the command is never run because VOLUME is only executed when the container starts, kind of like ENV.

Basically with VOLUME command you get performance without externally mounting any volumes. Data will save across container runs too without any external mounts. Then when ready simply mount something over it.

Some good example use cases:
- logs
- temp folders

Some bad use cases:
- static files
- configs
- code

Pettis answered 7/3, 2019 at 20:57 Comment(4)
Regarding the good and bad example use cases, Docker's "dockerfile best-practices" page says: "You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image.". I think configs are in there.Dip
It is OK to be explicit about the VOLUME dirs for configs. However once you actually mount a config you will have to mount over that directory and therefore the VOLUME command does not run. Therefore it is pointless to use VOLUME command on a dir specified for a config. Also initializing a volume graph with a single static read only file is serious overkill. So I stand by what I said, no need for VOLUME command on configs.Pettis
Volumes may bring different performance characteristics due to implementation detail. Database data files fit in this use case, but what would be the point of storing data alongside the (ephemeral) container storage anyway? I.e. attributing volumes existence to performance is incorrect.Vanderbilt
+1; the only answer giving a valid reason for why to use VOLUME (and why it exists in the first place)Exactitude
R
27

I don't consider the use of VOLUME good in any case, except if you are creating an image for yourself and no one else is going to use it.

I was impacted negatively due to VOLUME exposed in base images that I extended and only came up to know about the problem after the image was already running, like wordpress that declares the /var/www/html folder as a VOLUME, and this meant that any files added or changed during the build stage aren't considered, and live changes persist, even if you don't know. There is an ugly workaround to define web directory in another place, but this is just a bad solution to a much simpler one: just remove the VOLUME directive.

You can achieve the intent of volume easily using the -v option, this not only make it clear what will be the volumes of the container (without having to take a look at the Dockerfile and parent Dockerfiles), but this also gives the consumer the option to use the volume or not.

It's also bad to use VOLUMES due to the following reasons, as said by this answer:

However, the VOLUME instruction does come at a cost.

  • Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
  • There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.

The latter issue results in problems like these.

Having the option to undeclare a volume would help, but only if you know the volumes defined in the dockerfile that generated the image (and the parent dockerfiles!). Furthermore, a VOLUME could be added in newer versions of a Dockerfile and break things unexpectedly for the consumers of the image.

Another good explanation (about the oracle image having VOLUME, which was removed): https://github.com/oracle/docker-images/issues/640#issuecomment-412647328

More cases in which VOLUME broke stuff for people:

A pull request to add options to reset properties the parent image (including VOLUME), was closed and is being discussed here (and you can see several cases of people affected adversely due to volumes defined in dockerfiles), which has a comment with a good explanation against VOLUME:

Using VOLUME in the Dockerfile is worthless. If a user needs persistence, they will be sure to provide a volume mapping when running the specified container. It was very hard to track down that my issue of not being able to set a directory's ownership (/var/lib/influxdb) was due to the VOLUME declaration in InfluxDB's Dockerfile. Without an UNVOLUME type of option, or getting rid of it altogether, I am unable to change anything related to the specified folder. This is less than ideal, especially when you are security-aware and desire to specify a certain UID the image should be ran as, in order to avoid a random user, with more permissions than necessary, running software on your host.

The only good thing I can see about VOLUME is about documentation, and I would consider it good if it only did that (without any side effects).

Update (2021-10-19)

One more related issue with the mysql official image: https://github.com/docker-library/mysql/issues/255

Update (2022-01-26)

I found a good article explaining about the issues with VOLUME. It's already several years old, but the same issues remain:

https://boxboat.com/2017/01/23/volumes-and-dockerfiles-dont-mix/

TL;DR

I consider that the best use of VOLUME is to be deprecated.

Rota answered 28/5, 2020 at 15:24 Comment(14)
It' really like global variable. Perfect way to shoot your leg by side effectsSigne
Regarding the good and bad example use cases, Docker's "Dockerfile best-practices" page says: "You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image."Hostelry
@colynnliu Yes, I'm aware of that, but as expected there's not an explanation of why that is good. Because it actually isn't (at least in an image that is supposed to be used by other people), and you can see from my post how bad it can be, especially considering that you can map a volume in docker run easily and satisfy the use case you posted. An exception that I can think of is intermediate images in multi-stage builds, because that doesn't affect the consumers of the image. You can see this SO answer from the author of the last link I posted: stackoverflow.com/a/44060560Rota
Wrong: Please read my response here: https://mcmap.net/q/98652/-understanding-quot-volume-quot-instruction-in-dockerfileVacillate
@Vacillate that seems just an excuse for lazy development / DevOps. You can still define volumes in K8s, mapping to any physical volume available, and that would be much better because whatever the container persists is defined explicitly, while anonymous volumes can cause issues that can be very hard to reproduce across pods because of persisted data that you don't know that is being persisted.Rota
@LucasBasquerotto I agree, in real environments you definitely need to specify your volumes and can't rely on anonymous volumes. However it is a matter of intent and ownership IMO. If the original author of the image knows that it can not persist and work as expected without a particular volume then they should add the VOLUME instruction. In other words they can't just purely rely on end developers using it as expected. I am arguing against the fact that VOLUME instruction does have its use. In my job we use ephemeral environments and the anonymous volumes of images save us a lot of time.Vacillate
@Vacillate Unless you are using the image only for internal use, it's very difficult to know that a volume should be used in all cases. Just look of how many issues of people affected due to VOLUME defined in dockerfiles I posted. Many of them (and me) have real good reasons for not having to use volumes (which goes from code that should be distributed with the image to allow updates less error prone and errors more easily reproducible across enviroments, pre-seeded database containers used for tests/CI, user permissions in the database path defined in the Dockerfile for more security, and so on).Rota
@LucasBasquerotto Personally, since we use a lot of ephemeral environment setups (garden.io) I inspect the image and look at volumes before using it. If I see volumes already mentioned, I don't bother to worry about state. If I don't see any volumes I go to the documentation of the image and start reading and looking around. For example Redis, Kafka, Zookeeper take care of their volumes. On the other hand Elastic Search doesn't. In this case it is extra work for me to figure out what Elastic Search needs to function as intended or correctly.Vacillate
@Vacillate My take is that things that are changed by the container outside of it (that remains after its removal) should be defined explicitly (not knowing them could have bad consequences, like removing a physical device that should only keep temporary data but unadvertently removed important data due to a volume that the developer didn't know of). That said, if there was an option when running docker/compose etc., that made docker ignore volumes defined in dockerfiles, I wouldn't mind people that consider dockerfile volumes useful using them, and I would just use the option to disable them.Rota
@LucasBasquerotto I must be missing something. docker rm and docker compose down do take down the volumes and remove them unless you explicitly define them. In other words anonymous ones get removed automatically. docs.docker.com/engine/reference/commandline/container_rm/….Vacillate
@LucasBasquerotto perhaps I misunderstood ur comment. Can u please elaborate on why do you want to disable volumes in Dockerfile? 1. If defined anonymously. Docker will remove them when you remove the container. So no problem. 2. If you want to control their persistence then you just define them, either by name or mount... And in this case since you define them, you should know it is there. The only difference, is the container will behave correctly with anonymous volumes if scheduler decided to move it, like K8. Otherwise you will loose state randomly. So I still see them important.Vacillate
@Vacillate there's several cases in which you don't want volumes. Like I said in a previous comment: "Many of them (and me) have real good reasons for not having to use volumes (which goes from code that should be distributed with the image to allow updates less error prone and errors more easily reproducible across enviroments, pre-seeded database containers used for tests/CI, user permissions in the database path defined in the Dockerfile for more security, and so on)."Rota
@Vacillate regarding the anonymous volumes being removed, there are cases in which they aren't, and a new anonymous volume is created instead, and you end up with a lot of anonymous volumes. See: https://mcmap.net/q/100868/-docker-creates-a-new-volume-everytime-i-do-docker-compose-up. Furthermore, if the data stored in the volume is important (it's not ephemeral), removing the volume when the image is removed could make you lose important data. This might not be expected by the developer if they didn't know there was a volume in the first place.Rota
@LucasBasquerotto I guess let's agree to disagree at this point :) What you shared is attributed to lack of knowledge. As long as you don't remove the container, this is the expected behavior = Anonymous volumes. I remain firm on the idea that docker containers should be immutable and you can recreate them without loosing state. Without VOLUME this breaks this important property. Authors of images should be able to protect this property without relying on the end developer. Some reading material: medium.com/containers-101/docker-anti-patterns-ad2a1fcd5ce1Vacillate
H
1

Although it is a very old post, I still want you could check out the latest docker official docs if you have some confusion between volume with bind mounts

Bind mounts have been around since the early days of Docker, I think it should not be a perfect design either, eg "Bind mounts allow access to sensitive files", and you can get the info docker official prefers you use VOLUME rather than bind mounts.

You can get good use cases for volumes from here

Reference to

Hostelry answered 4/5, 2022 at 13:40 Comment(0)
V
0

VOLUME instruction is very important for the following:

Containers by design should be immutable. Shipping an image that relies on some data context without having the VOLUME defined as part of it breaks this immutability.

Adding VOLUME is also a way to self document the image. It shows in the meta data of the image docker inspect and tells the user these are needed, rather than just rely on external documentation.

Because of the immutability property, K8 is free to reschedule pods to other nodes, it does that to optimize resources. So even if you don't delete your pod, or redeploy. There is a chance that it gets recreated automatically and moved around, take this example:

You are the author of a certain database image and you retain your files for the db engine in the following path /var/database/data

If you don't add VOLUME to your Dockerfile while creating the image, you will have to rely on the end developer to make sure they add that volume and if they don't then your POD will loose crucial state when it gets rescheduled automatically.

Adding VOLUME to the Dockerfile by the author, ensures at the very least, that if the deployment setup isn't done correctly, i.e. doesn't explicitly create a volume, then your POD/Container will still behave as intended and retain it's state when it gets rescheduled/redeployed, hence respecting the immutability property. In ephemeral environments devs don't bother creating explicit volumes, but the anonymous volumes are still needed for proper functioning.

I do agree with the fact that it is very annoying when using multi-stage builds that the volume is not part of the image, making it difficult to modify that part of the data. Something like UN-VOLUME could come in handy.

Vacillate answered 1/2 at 15:37 Comment(6)
I heavily disagree. You can still define volume mappings. "If you don't add VOLUME to your Dockerfile while creating the image, you will have to rely on the end developer to make sure they add that volume and if they don't then your POD will lose crucial state [...]". If there's crucial data that needs persisted, the developer / DevOps team should be made aware in the image docs and map it to persist in the correct place. The anonymous volume may end up persisting data in a short lived physical device , which can be terminated later, causing all kinds of issues.Rota
You said it that it should be made aware in the docs. But good programming practices can not rely purely on documentation. The VOLUME forces the intent of the original author and it is definitely needed. It similar to when you design an image and you must provide an environment variable for example, it is not optional. By design the author of that image knows that this is needed and you won't get full functionality without it, hence forcing it. The Volume instruction kind of does the same thing. Forces the intent of the author.Vacillate
Now problems and issues start when you do have a VOLUME and you didn't properly add that volume in your particular setup, but how can the removal of the original VOLUME instruction help in this case? It doesn't. Furthermore, inspecting a particular image and checking the volumes section of it, directly reflects the intent of the author in that "look here, these volumes must exist" it is self documenting in this respect in the metadata itself.Vacillate
If a volume is needed, but wasn't added in the Dockerfile, the first time the container is recreated would allow the developer to know the issue, while an anonymous volume may make it seem like it's working even after the container is recreated, which makes reproducing errors more difficult, if the volume should be persisted in a different physical device.Rota
I agree that the developer should not need to know everything the container does, docker is especially good to encapsulate behaviour in the way it separates the dependencies of the container from the dependencies of other containers and the machine, for example. But a volume is something that persists even after container recreation and is something that is changed outside the container, so I would strongly advise that anyone that uses an image should know at least the volumes and ports a container created from an image expects (basically, its interface with the outside).Rota
By design, container stop, pause, recreate does not delete volumes. docker compose down or container remove does. I would contribute this to poor understanding on behalf of the user. If recreate deleted volumes (anonymous or otherwise) that would be a big problem. If I understand you correctly.Vacillate

© 2022 - 2024 — McMap. All rights reserved.