Why are Docker container images so large? [closed]
Asked Answered
C

9

230

I made a simple image through Dockerfile from Fedora (initially 320 MB).

Added Nano (this tiny editor of 1MB size), and the size of the image has risen to 530 MB. I've added Git on top of that (30-ish MB), and then my image size sky-rockets to 830 MB.

Isn't that insane?

I've tried to export and import container to remove history/intermediate images. This effort saved up to 25 MB, now my image size is 804 MB. I've also tried to run many commands on one RUN, but still I'm getting the same initial 830MB.

I'm having my doubts if it is worth to use Docker at all. I mean, I barely installed anything and I'm hitting 1GB over. If I will have to add some serious stuff like a database and so on I might run out of disk space.

Anyone suffers from ridiculous size of images? How do you deal with it?

Unless my Dockerfile is horribly incorrect?

FROM fedora:latest
MAINTAINER Me NotYou <[email protected]>
RUN yum -y install nano
RUN yum -y install git

but it's hard to imagine what could go wrong in here.

Capet answered 24/6, 2014 at 18:57 Comment(5)
Where and how are you measuring the size of your container? Does yum clean all have any effect on the size?Pinnatipartite
Expect images to be a good size since it is a accumulation of it's image, parent image(s) and the base image. Also, yum installs, not only the said apps, but also their dependencies. docs.docker.com/terms/containerSlurry
Well my "measuring" is execution of docker images which in last column states a hefty 830MB. I might be unaware of fact what is the actual size of my image since docker images command states that this 830MB is virtual size. But then again, what is the actual size of image?Capet
Closing this question as "not about programming or software development" is prima facie ... unreasonable. Also, the same question basically appears multiple times in the "Linked" subframe. Just saying.Berwick
this question is very much about software development, and I wouldn't say this is any less than cristal-clear, obvious.Heppman
S
147

As @rexposadas said, images include all the layers and each layer includes all the dependencies for what you installed. It is also important to note that the base images (like fedora:latest tend to be very bare-bones. You may be surprised by the number of dependencies your installed software has.

I was able to make your installation significantly smaller by adding yum -y clean all to each line:

FROM fedora:latest
RUN yum -y install nano && yum -y clean all
RUN yum -y install git && yum -y clean all

It is important to do that for each RUN, before the layer gets committed, or else deletes don't actually remove data. That is, in a union/copy-on-write file system, cleaning at the end doesn't really reduce file system usage because the real data is already committed to lower layers. To get around this you must clean at each layer.

$ docker history bf5260c6651d
IMAGE               CREATED             CREATED BY                                      SIZE
bf5260c6651d        4 days ago          /bin/sh -c yum -y install git; yum -y clean a   260.7 MB
172743bd5d60        4 days ago          /bin/sh -c yum -y install nano; yum -y clean    12.39 MB
3f2fed40e4b0        2 weeks ago         /bin/sh -c #(nop) ADD file:cee1a4fcfcd00d18da   372.7 MB
fd241224e9cf        2 weeks ago         /bin/sh -c #(nop) MAINTAINER Lokesh Mandvekar   0 B
511136ea3c5a        12 months ago                                                       0 B
Selfcommand answered 24/6, 2014 at 22:48 Comment(6)
Thank you for your effort to investigate the case, and yes i was able to reduce my image size to roughly 635MB (this is the value presented as virtual image size after execution of docker images). Is it possible to remove/delete/destroy those old layers? To be more specific: i would like to completely remove (basing on your example) images: 172743bd5d60, 3f2fed40e4b0, fd241224e9cf, 511136ea3c5a from history, so that my virtual image size is more-less the same as the final image size, here ~260MB.Capet
(Too long for 1 comment) Unless virtual image size has nothing to do with the actual size of image on HDD? If this is the case, then how/where to check the actual size of my images?Capet
You could docker export and then docker import again. That would flatten the layers. I don't think it would reduce the size, but I could be wrong.Selfcommand
Yeah but exporting does not save much. Nonetheless i was able to read through the web that what i can observe in docker is virtual image size. The actual size on HDD seems to be mystery for me since with respect to official information docker ps -s shows real size on HDD which in my case was -1B. That sounds reasonable, minus 1 Byte. I have gained some space on HDD... seems legit.Capet
According to #39223749 , using RUN only once with continuous line() is better, right?Ligroin
Wow, I learned this today. After having used docker for many years! Thank you O kind stranger! I managed to reduce the size of my image by half!Mellar
L
81

Docker images are not large, you are just building large images.

The scratch image is 0B and you can use that to package up your code if you can compile your code into a static binary. For example, you can compile your Go program and package it on top of scratch to make a fully usable image that is less than 5MB.

The key is to not use the official Docker images, they are too big. Scratch isn't all that practical either so I'd recommend using Alpine Linux as your base image. It is ~5MB, then only add what is required for your app. This post about Microcontainers shows you how to build very small images base on Alpine.

UPDATE: the official Docker images are based on alpine now so they are good to use now.

Lowder answered 25/1, 2016 at 22:17 Comment(1)
Thankfully, the Docker Official images are also moving to use an Alpine base, so more and more you can use the regular images instead of depending on iron.io's versions. See brianchristner.io/docker-is-moving-to-alpine-linuxRuthenium
S
54

Here are some more things you can do:

  • Avoid multiple RUN commands where you can. Put as much as possbile into one RUN command (using &&)
  • clean-up unnecessary tools like wget or git (which you only need for download or building stuff, but not to run your process)

With these both AND the recommendations from @Andy and @michau I was able to resize my nodejs image from 1.062 GB to 542 MB.

Edit: One more important thing: "It took me a while to really understand that each Dockerfile command creates a new container with the deltas. [...] It doesn't matter if you rm -rf the files in a later command; they continue exist in some intermediate layer container." So now I managed to put apt-get install, wget, npm install (with git dependencies) and apt-get remove into a single RUN command, so now my image has only 438 MB.

Edit 29/06/17

With Docker v17.06 there comes a new features for Dockerfiles: You can have multiple FROM statements inside one Dockerfile and only the stuff from last FROM will be in your final Docker image. This is useful to reduce image size, for example:

FROM nodejs as builder
WORKDIR /var/my-project
RUN apt-get install ruby python git openssh gcc && \
    git clone my-project . && \
    npm install

FROM nodejs
COPY --from=builder /var/my-project /var/my-project

Will result in an image having only the nodejs base image plus the content from /var/my-project from the first steps - but without the ruby, python, git, openssh and gcc!

Scab answered 22/8, 2016 at 11:50 Comment(0)
A
26

Yes, those sizes are ridiculous, and I really have no idea why so few people notice that.

I made an Ubuntu image that is actually minimal (unlike other so-called "minimal" images). It's called textlab/ubuntu-essential and has 60 MB.

FROM textlab/ubuntu-essential
RUN apt-get update && apt-get -y install nano

The above image is 82 MB after installing nano.

FROM textlab/ubuntu-essential
RUN apt-get update && apt-get -y install nano git

Git has many more prerequisites, so the image gets larger, about 192 MB. That's still less that the initial size of most images.

You can also take a look at the script I wrote to make the minimal Ubuntu image for Docker. You can perhaps adapt it to Fedora, but I'm not sure how much you will be able to uninstall.

Armallas answered 9/12, 2014 at 17:8 Comment(0)
E
19

The following helped me a lot:

After removing unused packages (e.g. redis 1200 mb freed) inside my container, I have done the following:

  1. docker export [containerID] -o containername.tar
  2. docker import -m "commit message here" containername.tar imagename:tag

The layers get flatten. The size of the new image will be smaller because I've removed packages from the container as stated above.

This took a lot of time to understand this and that's why I've added my comment.

Endearment answered 25/8, 2016 at 11:21 Comment(1)
You can combine both steps into a single step docker export <CONTAINER ID> | docker import - some-image-name:latestIndoor
C
16

For best practise, you should execute a single RUN command, because every RUN instruction in the Dockerfile writes a new layer in the image and every layer requires extra space on disk. In order to keep the number layers to a minimum, any file manipulation like install, moving, extracting, removing, etc, should ideally be made under a single RUN instruction

FROM fedora:latest
RUN yum -y install nano git && yum -y clean all
Coachman answered 30/9, 2016 at 12:37 Comment(0)
E
6

Docker Squash is a really nice solution to this. you can $packagemanager clean in the last step instead of in every line and then just run a docker squash to get rid of all of the layers.

https://github.com/jwilder/docker-squash

Equitant answered 26/3, 2016 at 15:42 Comment(0)
B
1

Yes the layer system is quite surprising. If you have a base image and you increment it by doing the following:

# Test
#
# VERSION       1

# use the centos base image provided by dotCloud
FROM centos7/wildfly
MAINTAINER JohnDo 

# Build it with: docker build -t "centos7/test" test/

# Change user into root
USER root

# Extract weblogic
RUN rm -rf /tmp/* \
    && rm -rf /wildfly/* 

The image has exactly the same size. That essentially means, you have to manage to put into your RUN steps a lot of extract, install and cleanup magic to make the images as small as the software installed.

This makes life much harder...

The dockerBuild is missing RUN steps without commit.

Behlau answered 16/11, 2016 at 18:10 Comment(0)
P
0

We had a similar issue in our docker build process. Each image built was significantly larger than the others. As it turns out we were getting tar.gz files included in the image. Among these were the compressed images we upload to a server. So each image contained the prior images by accident. Image sizes were soon in the 8gb range.

.dockerignore is your friend. Make sure anything in your project not necessary to build the image is in the ignore file.

Plumley answered 25/1, 2023 at 21:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.