What are Docker image "layers"?
Asked Answered
C

11

277

I am brand new to Docker and am trying to understand exactly what a Docker image is. Every single definition of a Docker image uses the term "layer", but does not seem to define what is meant by layer.

From the official Docker docs:

We’ve already seen that Docker images are read-only templates from which Docker containers are launched. Each image consists of a series of layers. Docker makes use of union file systems to combine these layers into a single image. Union file systems allow files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system.

So I ask, what is a layer exactly? Can someone give a few concrete examples of them? And how do these layers "snap together" to form an image?

Clarissa answered 4/7, 2015 at 15:38 Comment(0)
M
244

I might be late, but here's my 10 cents (complementing ashishjain's answer):

Basically, a layer, or image layer is a change on an image, or an intermediate image. Every command you specify (FROM, RUN, COPY, etc.) in your Dockerfile causes the previous image to change, thus creating a new layer. You can think of it as staging changes when you're using git: You add a file's change, then another one, then another one...

Consider the following Dockerfile:

FROM rails:onbuild
ENV RAILS_ENV production
ENTRYPOINT ["bundle", "exec", "puma"]

First, we choose a starting image: rails:onbuild, which in turn has many layers. We add another layer on top of our starting image, setting the environment variable RAILS_ENV with the ENV command. Then, we tell docker to run bundle exec puma (which boots up the rails server). That's another layer.

The concept of layers comes in handy at the time of building images. Because layers are intermediate images, if you make a change to your Dockerfile, docker will rebuild only the layer that was changed and the ones after that. This is called layer caching.

You can read more about it here.

Mercurial answered 20/11, 2015 at 22:6 Comment(3)
If you change or add a layer, Docker will also build any layers that come afterwards because they might be affected by the change.Focus
@David, in above example, how many layers will be added? 2? or 1?Volnay
@GouravSingla It should be 2. Change ENV is also a change. It looks like the layer is the commit of git.Rebroadcast
H
96

They make the most sense to me with an example...

Examining layers of your own build with docker diff

Lets take a contrived example Dockerfile:

FROM busybox

RUN mkdir /data
# imagine this is downloading source code
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one 
RUN chmod -R 0777 /data
# imagine this is compiling the app
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two 
RUN chmod -R 0777 /data
# and now this cleans up that downloaded source code
RUN rm /data/one 

CMD ls -alh /data

Each of those dd commands outputs a 1M file to the disk. Lets build the image with an extra flag to save the temporary containers:

docker image build --rm=false .

In the output, you'll see each of the running commands happen in a temporary container that we now keep instead of automatically deleting:

...
Step 2/7 : RUN mkdir /data
 ---> Running in 04c5fa1360b0
 ---> 9b4368667b8c
Step 3/7 : RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one
 ---> Running in f1b72db3bfaa
1024+0 records in
1024+0 records out
1048576 bytes (1.0MB) copied, 0.006002 seconds, 166.6MB/s
 ---> ea2506fc6e11

If you run a docker diff on each of those container id's, you'll see what files were created in those containers:

$ docker diff 04c5fa1360b0  # mkdir /data
A /data
$ docker diff f1b72db3bfaa  # dd if=/dev/zero bs=1024 count=1024 of=/data/one
C /data
A /data/one
$ docker diff 81c607555a7d  # chmod -R 0777 /data
C /data
C /data/one
$ docker diff 1bd249e1a47b  # dd if=/dev/zero bs=1024 count=1024 of=/data/two
C /data
A /data/two
$ docker diff 038bd2bc5aea  # chmod -R 0777 /data
C /data/one
C /data/two
$ docker diff 504c6e9b6637  # rm /data/one
C /data
D /data/one

Each line prefixed with an A is adding the file, the C indicates a change to an existing file, and the D indicates a delete.

Here's the TL;DR part

Each of these container filesystem diffs above goes into one "layer" that gets assembled when you run the image as a container. The entire file is in each layer when there's an add or change, so each of those chmod commands, despite just changing a permission bit, results in the entire file being copied into the next layer. The deleted /data/one file is still in the previous layers, 3 times in fact, and will be copied over the network and stored on disk when you pull the image.

Examining existing images

You can see the commands that goes into creating the layers of an existing image with the docker history command. You can also run a docker image inspect on an image and see the list of layers under the RootFS section.

Here's the history for the above image:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
a81cfb93008c        4 seconds ago       /bin/sh -c #(nop)  CMD ["/bin/sh" "-c" "ls -…   0B
f36265598aef        5 seconds ago       /bin/sh -c rm /data/one                         0B
c79aff033b1c        7 seconds ago       /bin/sh -c chmod -R 0777 /data                  2.1MB
b821dfe9ea38        10 seconds ago      /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
a5602b8e8c69        13 seconds ago      /bin/sh -c chmod -R 0777 /data                  1.05MB
08ec3c707b11        15 seconds ago      /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
ed27832cb6c7        18 seconds ago      /bin/sh -c mkdir /data                          0B
22c2dd5ee85d        2 weeks ago         /bin/sh -c #(nop)  CMD ["sh"]                   0B
<missing>           2 weeks ago         /bin/sh -c #(nop) ADD file:2a4c44bdcb743a52f…   1.16MB

The newest layers are listed on top. Of note, there are two layers at the bottom that are fairly old. They come from the busybox image itself. When you build one image, you inherit all the layers of the image you specify in the FROM line. There are also layers being added for changes to the image meta-data, like the CMD line. They barely take up any space and are more for record keeping of what settings apply to the image you are running.

Why layers?

The layers have a couple advantages. First, they are immutable. Once created, that layer identified by a sha256 hash will never change. That immutability allows images to safely build and fork off of each other. If two dockerfiles have the same initial set of lines, and are built on the same server, they will share the same set of initial layers, saving disk space. That also means if you rebuild an image, with just the last few lines of the Dockerfile experiencing changes, only those layers need to be rebuilt and the rest can be reused from the layer cache. This can make a rebuild of docker images very fast.

Inside a container, you see the image filesystem, but that filesystem is not copied. On top of those image layers, the container mounts it's own read-write filesystem layer. Every read of a file goes down through the layers until it hits a layer that has marked the file for deletion, has a copy of the file in that layer, or the read runs out of layers to search through. Every write makes a modification in the container specific read-write layer.

Reducing layer bloat

One downside of the layers is building images that duplicate files or ship files that are deleted in a later layer. The solution is often to merge multiple commands into a single RUN command. Particularly when you are modifying existing files or deleting files, you want those steps to run in the same command where they were first created. A rewrite of the above Dockerfile would look like:

FROM busybox

RUN mkdir /data \
 && dd if=/dev/zero bs=1024 count=1024 of=/data/one \
 && chmod -R 0777 /data \
 && dd if=/dev/zero bs=1024 count=1024 of=/data/two \
 && chmod -R 0777 /data \
 && rm /data/one

CMD ls -alh /data

And if you compare the resulting images:

  • busybox: ~1MB
  • first image: ~6MB
  • second image: ~2MB

Just by merging together some lines in the contrived example, we got the same resulting content in our image, and shrunk our image from 5MB to just the 1MB file that you see in the final image.

Hyphenate answered 2/8, 2018 at 19:32 Comment(5)
Traversing the layers during file reads entails some overhead, right? To save that overhead, does it make sense to combine multiple commands (that need to be executed together anyways) in one RUN?Congelation
@Congelation depends on how much time you want to spend prematurely optimizing. The risk is spending hours of developer time, gigs of added bandwidth and storage, to save milliseconds of runtime. As with many things performance related, there are extremes, and there is a need to measure the issue before putting effort into fixing it.Hyphenate
Would it be a good practice to reduce layer bloat as suggested as much as possible?Persinger
@Unknown it depends on how much layer reuse from other images on the host (including previous versions of the same image) you are losing by squashing layers. Layer reuse saves a significant amount of network and disk usage, and is lost when images have a single layer.Hyphenate
Do large corporations running many images and containers try to optimize that at all or is it rarely worth consideration, i.e. always separate RUNs?Persinger
P
91

A docker container image is created using a dockerfile. Every line in a dockerfile will create a layer. Consider the following dummy example:

FROM ubuntu             #This has its own number of layers say "X"
MAINTAINER FOO          #This is one layer 
RUN mkdir /tmp/foo      #This is one layer 
RUN apt-get install vim #This is one layer 

This will create a final image where the total number of layers will be X+3

Paratroops answered 4/7, 2015 at 17:13 Comment(5)
While I didn't downvote, my guess would be that this explains how to create the layers, but does in no way answer the question about what a layer is.Bieber
I agree with @LasseV.Karlsen, ashishjain. I didn't downvote you and am in fact upvoting you for trying to help me (so +1) - but in order for me to be able to give you the green check, I need to understand what a layer actually is! Thanks again, keep going!Clarissa
best answer imo. for a lot of us who move into "utilizing docker", it gives us the gist of how the layers work.Nicolais
"Every line in a dockerfile will create a layer" - this was very helpful for me to knowNicholasnichole
@Nicholasnichole That is not the full story. Most lines will create a layer, but only ADD, COPY or RUN instructions will create layers which increase the size of the resulting container image. I said most lines because if you chain together commands, or escape newlines with a backslash, the sequence of chained commands/escaped newlines will form a single command.Glib
F
22

Per Docker's image spec via The Moby Project:

Images are composed of layers. Each layer is a set of filesystem changes. Layers do not have configuration metadata such as environment variables or default arguments - these are properties of the image as a whole rather than any particular layer.

So, essentially, a layer is just a set of changes made to the filesystem.

Freeloader answered 14/11, 2016 at 15:5 Comment(1)
It only took me a couple hours to find it, but with this elegantly-simple answer, I finally understand what a layer is: "Each [Docker] layer is a set of filesystem changes." (Assuming this is true.) For some reason I did not understand this fundamental point when reading numerous other docs/blogs/Q+A's/etc, and I suspect the limitation was theirs and not mine. Regardless, bravo Aditya for getting to the heart of the matter.Differentiate
D
22

Since Docker v1.10, with introduction of the content addressable storage, the notion of 'layer' became quite different. Layers have no notion of an image or belonging to an image, they become merely collections of files and directories that can be shared across images. Layers and images became separated.

For example, on a locally built image from a base image, let's say, ubuntu:14.04, the docker history command yields the image chain, but some of the image IDs will be shown as 'missing' because the build history is no longer loaded. And the layers that compose these images can be found via

docker inspect <image_id> | jq -r '.[].RootFS'

The layer content is stored at /var/lib/docker/aufs/diff if the storage driver selection is aufs. But the layers are named with a randomly generated cache ID, it seems the link between a layer and its cache ID is only known to Docker Engine for security reasons. I am still looking for a way to find out

  1. The corresponding relation between an image and its composing layer(s)
  2. Actual location and size of a layer on the disk

This blog provided much insight.

Dietary answered 28/3, 2017 at 12:11 Comment(1)
In this SO entry I posted a rather naive way of answering the two questions I posted.Dietary
A
16

I think the official document gives a pretty detailed explanation: https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/.


(source: docker.com)

An image consists of many layers which usually are generated from Dockerfile, each line in Dockerfile will create a new layer, and the result is an image, which is denoted by the form repo:tag, like ubuntu:15.04.

For more information, please consider reading the official docs above.

Apodictic answered 9/11, 2016 at 8:56 Comment(0)
V
6

I used to think they are like diffs on previous layers. After reading some of the answers here I was not so sure; they are described as sets of changes to the filesystem. I've written some Dockerfiles to show they are more like diffs, ie, they really depend on previous layers.

Given these two Dockerfiles

FROM bash
RUN mkdir /data
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/three

and

FROM bash
RUN mkdir /data
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/three
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/two
RUN dd if=/dev/zero bs=1024 count=1024 of=/data/one

one would expect the same set of layers if they just were about changes to the filesystem, but this is not the case:

$ docker history img_1
IMAGE               CREATED             CREATED BY                                      SIZE
30daa166a9c5        6 minutes ago       /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
4467d16e79f5        6 minutes ago       /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
c299561fd031        6 minutes ago       /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
646feb178431        6 minutes ago       /bin/sh -c mkdir /data                          0B
78664daf24f4        2 weeks ago         /bin/sh -c #(nop)  CMD ["bash"]                 0B
<missing>           2 weeks ago         /bin/sh -c #(nop)  ENTRYPOINT ["docker-entry…   0B
<more missing...>

and

$ docker history img_2
IMAGE               CREATED             CREATED BY                                      SIZE
f55c91305f8c        6 minutes ago       /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
29b3b627c76f        6 minutes ago       /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
18360be603aa        6 minutes ago       /bin/sh -c dd if=/dev/zero bs=1024 count=102…   1.05MB
646feb178431        6 minutes ago       /bin/sh -c mkdir /data                          0B
78664daf24f4        2 weeks ago         /bin/sh -c #(nop)  CMD ["bash"]                 0B
<missing>           2 weeks ago         /bin/sh -c #(nop)  ENTRYPOINT ["docker-entry…   0B
<more missing...>

You can see how, even if the changes to the filesystem are the same in both cases, the order matters.

Vitals answered 8/4, 2020 at 7:50 Comment(1)
Worth pointing out that the images for the first 2 layers of the second run are the cached ones of the first run: 78664daf24f4 and 646feb178431.Persinger
H
4

Thank you @David Castillo for the useful information. I think the layer is some binary change or instruction of a image that can be done or undone easily. They are done step by step that is same as a layer on a layer, so we called "layer".

For more information you can see the "docker history" like this:

docker images --tree
Warning: '--tree' is deprecated, it will be removed soon. See usage.
└─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest
  └─59e359cb35ef Virtual Size: 85.18 MB
    └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy
      └─c58b36b8f285 Virtual Size: 85.18 MB
        └─90ea6e05b074 Virtual Size: 118.6 MB
          └─5dc74cffc471 Virtual Size: 118.6 MB Tags: vim:latest

Herwick answered 2/3, 2016 at 4:25 Comment(1)
found a new information about layers: When Docker mounts the rootfs, it starts read-only, as in a traditional Linux boot, but then, instead of changing the file system to read-write mode, it takes advantage of a union mount to add a read-write file system over the read-only file system. In fact there may be multiple read-only file systems stacked on top of each other. We think of each one of these file systems as a layer.Herwick
I
4

My personal understanding is that we can compare docker layer to github commit. For your base image(your fresh master repo), you make several commits, every commit is changing your master status, it's the same in docker, every layer is doing some operation based on previous intermediate layer. And then, this layer become a new intermediate layer to the next layer.

Illtimed answered 2/8, 2018 at 17:58 Comment(0)
S
4

The thing I can add to this questison thread, is that you can use git as a metahpor to understand the concept of layers, images and containers.

Here is how it goes:

  1. A git repo crossponds to the image.
  2. Commit to a git repo crossponds to adding a layer.
    1. A git repo is started with git init on another hand an image build starts with the FROM. While you can use multiple FROM in multi stage builds, the concept stays the same since only the last FROM is used.
    2. A git repo is updated or based on a series of commits while an image is built by series of command/layers (excepts the CMD) of which each command creates a new layer of the image.
  3. Cloning and running a git repo corresponds to starting the container (CMD). Hence when cloning a git repo you typically clone the latest commit, while docker uses the latest created layer of the image when starting(docker run ...)

A final note is, to make sense of layers, it's hard to think of it an isolated concept, rather approach as component or as one of the building blocks that makes container/image works. Similarly, the commit is one of the building blocks of the git versioning system, and it could be called anything while choosing commit definitely makes sense :).

Scraggy answered 22/8, 2021 at 16:14 Comment(0)
S
0

Layers are folders containing files and folders that were created as a result of building the image.

For example:

FROM alpine:3.14            # Layer 1
RUN apk add --no-cache tree # Layer 2
COPY test.txt /tmp/         # Layer 3
ENTRYPOINT ["tree"]

This Dockerfile will create three folders that will be then copied to host system and "merged" together creating so called Union Filesystem. The folders are actually not merged physically, but the Union Mount is used to create an illusion that they are.

In the above example there would be:

# Layer 1
/var/lib/docker/overlay2/1d06...35310/diff
bin    dev    etc    home   lib    media  mnt    opt    proc   root   run    sbin   srv    sys    tmp    usr    var

# Layer 2
/var/lib/docker/overlay2/23wgom2anm2uysvg988r3tw9c/diff
etc  lib  usr  var
           bin
             tree

# Layer 3
/var/lib/docker/overlay2/41htpkg76b3zwg29kqsb103of/diff
tmp
 test.txt

All those folders would be then "merged" using aforementioned mount command creating the final Linux filesystem, that would be then set as a root directory for the running process (aka container) ("tree" in this case) using chroot command or similar.

More info can be found here: https://martinheinz.dev/blog/44

Sequestration answered 18/10, 2022 at 17:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.