How does the new Docker --squash work
Asked Answered
F

2

91

In Docker 1.13 the new --squash parameter was added.

I'm now hoping to reduce the size of my images as well as being able to "hide" secret files I have in my layers.

Below you can now see the difference from doing a build with and without the --squash parameter.

Without Squash

enter image description here

With Squash

enter image description here

Now to my question.

If I add a secret file in my first layer, then use the secret file in my second layer, and the finally remove my secret file in the third layer, and then build with the --squash flag.

Will there be any way now to get the secret file?

Flexile answered 20/1, 2017 at 13:0 Comment(0)
D
97

If I add a secret file in my first layer, then use the secret file in my second layer, and the finally remove my secret file in the third layer, and then build with the --squash flag.

Will there be any way now to get the secret file?

Answer: Your image won't have the secret file.

How --squash works:

Once the build is complete, Docker creates a new image loading the diffs from each layer into a single new layer and references all the parent's layers.

In other words: when squashing, Docker will take all the filesystem layers produced by a build and collapse them into a single new layer.

This can simplify the process of creating minimal container images, but may result in slightly higher overhead when images are moved around (because squashed layers can no longer be shared between images). Docker still caches individual layers to make subsequent builds fast.

Please note this feature squashes all the newly built layers into a single layer, it is not squashing to scratch.

Side notes:

Docker 1.13 also has support for compressing the build context that is sent from CLI to daemon using the --compress flag. This will speed up builds done on remote daemons by reducing the amount of data sent.

Please note as of Docker 1.13 this feature is experimental.

Update 2024: Squash has been moved to buildkit and later on deprecated from buildkit

WARNING: experimental flag squash is removed with BuildKit. You should squash inside build using a multi-stage Dockerfile for efficiency.

As the warning suggests you need to use multi-stage builds instead of squashing layers.

Example:

# syntax=docker/dockerfile:1
FROM golang:1.21
WORKDIR /src
COPY <<EOF ./main.go
package main

import "fmt"

func main() {
  fmt.Println("hello, world")
}
EOF
RUN go build -o /bin/hello ./main.go

FROM scratch
COPY --from=0 /bin/hello /bin/hello
CMD ["/bin/hello"]
Drucill answered 20/1, 2017 at 13:23 Comment(6)
If you were to do this without --squash, would the file be accessible somehow by reading the layers on file system? With a test I have made it says that the layers are "missing" which have the secret file in, I assume this means that the secret file cannot be accessed somehow by digging into the layers?Itinerant
If you published or distributed the image, yes, someone could get into one of the other layers and look at the secret file. Ever notice how when you pull a Docker image you'll sometimes see it pulling several layers? You'll notice each layer has a hash ID - you can actually use those hash IDs directly to launch containers from any layer in the stack. From there a user could override entrypoint and command, get a shell and poke around. (Or they could just go look in the /var/lib/docker folder and poke around there.)Revamp
For anyone landing here and wondering why they do not see --squash among the options shown for docker build --help, you need to turn on experimental features for the daemon. On typical Linux installs, edit /etc/docker/daemon.json and add "experimental": true at the end of the block. Restart the docker daemon for it to take effect. How to do that depends on the Linux distro, sudo systemctl restart docker for Ubuntu/debian. For Mac OS X, the JSON to edit can be found in the Docker Engine section of the preferences control panel, click the gear icon at the upper right of desktop app.Hoax
For rootless docker the "experimental": true should be in ~/.config/docker/daemon.json fileNutt
this answer is no longer correct since --squash has been removed from docker (buildkit really). The answer from @Boden should be considered the defacto answer here.Apps
@Apps thanks for the notification, updated the answer.Drucill
B
14

in 2023

With BuildKit, there is no such working thing as squash. You can run docker build --squash=true ., but then you'll get the message:

WARNING: experimental flag squash is removed with BuildKit. You should squash inside build using a multi-stage Dockerfile for efficiency.

How does this work?

Simply start FROM scratch like so:

FROM nice-image:latest AS base

RUN command-that-gives-huge-diff
RUN more-things

# can also do multiple times
FROM scratch #(AS something)

COPY --from=base / /

Practical example

Installing anything on archlinux:latest (afaik) requires to initialize the pacman db. This will fetch super new pacman repositories. Installing anything without upgrading will probably not work, since it will reference the new repositories, while the old ones are installed. So, to update, run pacman -Syu, which shows:

Total Download Size: 87.56 MiB
Total Installed Size: 359.99 MiB
Net Upgrade Size: 4.99 MiB

That is a huge increase in container size that could be avoided! (Since the diffs between layers are saved, increase in container size is like >360MiB, even though the total size of the filesystem would only increase by ~5MiB)

Here is the relevant dockerfile, where I installed gpg1:

ARG BASE=archlinux:latest

FROM $BASE as base # use this stage for running system updates

RUN pacman -Syu --noconfirm

# start magic
FROM scratch as stage1

COPY --from=base / /
# end magic

# start arch build stuff
RUN pacman -Syu --noconfirm && \
    pacman -S --noconfirm base-devel bzip2 curl libldap libusb-compat pinentry readline zlib && \
    useradd --create-home --shell /bin/bash user

USER user
WORKDIR /home/user

RUN curl -L -O https://aur.archlinux.org/cgit/aur.git/snapshot/gnupg1.tar.gz && \
    tar -xvf gnupg1.tar.gz && \
    cd gnupg1 && \
    makepkg --skippgpcheck
#end arch build stuff

FROM scratch AS final

# so now this is the initial container
COPY --from=base / /

# copy built package from stage1
COPY --from=stage1 /home/user/gnupg1/gnupg1-1.4.23-1-x86_64.pkg.tar.zst /home/gnupg1-1.4.23-1-x86_64.pkg.tar.zst

#install the package (no need for -Syu, since we copied from base) and clean
RUN pacman -U --noconfirm /home/gnupg1-1.4.23-1-x86_64.pkg.tar.zst && \
    pacman -Scc --noconfirm && \
    rm -f /var/lib/pacman/sync/*

#this saves only like 10MB, so not really necessary
FROM scratch

COPY --from=final / /
Boden answered 6/9, 2023 at 13:4 Comment(3)
This is indeed one of the best options where we have to make install something on image and directly make that as final.Grieve
If you run curl-configure-make-make install-rm in a single RUN command, it should only add the diff of the final result if I'm correct.Boden
Unfortunately FROM scratch + COPY --from=base / / is not enough to squash an image, you also need to find out and copy the original ENV, CMD, ENTRYPONT, etc. commands.Sixtieth

© 2022 - 2024 — McMap. All rights reserved.