Docker Image > 1GB in size from python:3.8.3-alpine
Asked Answered
C

3

8

I'm pretty new to docker and, although I've read lots of articles, tutorials and watched YouTube videos, I'm still finding that my image size is in excess of 1 GB when the alpine image for Python is only about 25 MB (if I'm reading this correctly!).

I'm trying to work out how to make it smaller (if in fact it needs to be).

[Note: I've been following tutorials to create what I have below. Most of it makes sense .. but some of it feels like voodoo]

Here is my Dockerfile:

FROM python:3.8.3-alpine

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

RUN mkdir -p /home/app

RUN addgroup -S app && adduser -S app -G app

ENV HOME=/home/app
ENV APP_HOME=/home/app/web
RUN mkdir $APP_HOME
RUN mkdir $APP_HOME/staticfiles
RUN mkdir $APP_HOME/mediafiles
WORKDIR $APP_HOME

RUN pip install --upgrade pip

COPY requirements.txt .

RUN apk update \
    && apk add --virtual build-deps gcc python3-dev musl-dev \
    && apk add postgresql-dev \
    && apk add jpeg-dev zlib-dev libjpeg \
    && apk add --update --no-cache postgresql-client

RUN pip install -r requirements.txt

RUN apk del build-deps

COPY entrypoint.prod.sh $APP_HOME

COPY . $APP_HOME

RUN chown -R app:app $APP_HOME

USER app

ENTRYPOINT ["/home/app/web/entrypoint.prod.sh"]

Using Pillow and psycopg2-binary has caused a world of confusion and hurt. Particularly with the following:

RUN apk update \
    && apk add --virtual build-deps gcc python3-dev musl-dev \
    && apk add postgresql-dev \
    && apk add jpeg-dev zlib-dev libjpeg \
    && apk add --update --no-cache postgresql-client

RUN pip install -r requirements.txt

RUN apk del build-deps

This was originally:

RUN apk update \
    && apk add --virtual build-deps gcc python3-dev musl-dev \
    && apk add postgresql \
    && apk add postgresql-dev \
    && apk add --update --no-cache postgresql-client \
    && pip install psycopg2-binary \
    && apk add jpeg-dev zlib-dev libjpeg \
    && pip install Pillow \
    && apk del build-deps

I really have no idea how much of the above I need to make it work. I think there might be a way of reducing the build.

I know there is a way to build the original image and then use that to transfer things over, but the only tutorials are confusing and I am struggling to get my head around this without adding more complexity. I really wish I had someone who could just explain it in person.

I also don't know if the size of the image is coming from the requirements.txt file. I'm using django and there are a number of requirements:

requirements.txt

asgiref==3.4.1
Babel==2.9.1
boto3==1.18.12
botocore==1.21.12
certifi==2021.5.30
charset-normalizer==2.0.4
crispy-bootstrap5==0.4
defusedxml==0.7.1
diff-match-patch==20200713
Django==3.2.5
django-anymail==8.4
django-compat==1.0.15
django-crispy-forms==1.12.0
django-environ==0.4.5
django-extensions==3.1.3
django-hijack==2.3.0
django-hijack-admin==2.1.10
django-import-export==2.5.0
django-money==2.0.1
django-recaptcha==2.0.6
django-social-share==2.2.1
django-storages==1.11.1
et-xmlfile==1.1.0
fontawesomefree==5.15.3
gunicorn==20.1.0
idna==3.2
jmespath==0.10.0
MarkupPy==1.14
odfpy==1.4.1
openpyxl==3.0.7
Pillow==8.3.1
psycopg2-binary==2.9.1
py-moneyed==1.2
python-dateutil==2.8.2
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
s3transfer==0.5.0
six==1.16.0
sqlparse==0.4.1
stripe==2.60.0
tablib==3.0.0
urllib3==1.26.6
xlrd==2.0.1
xlwt==1.3.0

The question I have is, how do I make the image smaller. Does it need to be smaller?

I'm just trying to find the best way to deploy the Django app to Digitalocean and there is a world of confusion with so many approaches and tutorials etc. I don't know if it makes it easier to use docker. Do I just use their App Platform? Will that provide SSL? What are the advantages to using docker etc?

docker-compose file (for reference)

version: '3.7'

services:
  web:
    build:
      context: .
      dockerfile: Dockerfile.prod
    command: gunicorn maffsguru.wsgi:application --bind 0.0.0.0:8000
    volumes:
      - static_volume:/home/app/web/staticfiles
      - media_volume:/home/app/web/mediafiles
    expose:
      - 8000
    env_file:
      - .env.docker
    depends_on:
      - db
  db:
    image: postgres:12.0-alpine
    env_file:
      - .env.docker
    volumes:
      - postgres_data:/var/lib/postgresql/data/
    ports:
      - 5432:5432
  nginx:
    build: ./nginx
    volumes:
      - static_volume:/home/app/web/staticfiles
      - media_volume:/home/app/web/mediafiles
    ports:
      - 1337:80
    depends_on:
      - web

volumes:
  postgres_data:
  static_volume:
  media_volume:

Just to say ... the above all seems to work ... but I don't know if the size of the image etc is going to be a problem?

I am also confused as to why Nginx seems to need me to do http://0.0.0.0:1337 to view the site. Isn't the whole point to view it by navigating to http://0.0.0.0/

Thanks for any advice or guidance you might be able to give and apologies for the random nature of my questions

Chlamydate answered 5/8, 2021 at 0:7 Comment(0)
P
13

welcome to Docker! It can be quite the thing to wrap one's head around, especially when beginning, but you're asking really valid questions that are all pertinent

Reducing Size

How to

A great place to start is Docker's own Dockerfile best practices page:

https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

They explain neatly how your each directve (COPY, RUN, ENV, etc) all create additional layers, increasing your containers size. Importantly, they show how to reduce your image size by minimising the different directives. They key to alot of minimisation is chaining commands in RUN statements with the use of &&.

Something else I note in your Dockerfile is one specific line:

COPY . $APP_HOME

Now, depending on how you build your container (Specifically, what folder you pass to Docker as the context), this will copy EVERYTHING in that it has available to it. Chances are, this will be bringing in your venv folder etc if you have one. I feel that this may be the largest perpetrator of size for you. You can mitigate this by adding an explicit COPY in, or using a .dockerignore file.

I built your image (Without any source code, and without copying in entrypoint.sh), and it came out to 710MB as a base. It could be a good idea to check the size of your source code, and see if anything else is getting in there. After I re-arranged some of the commands to reuse directives, the image was 484MB, which is considerably smaller! If you get stuck, I can pop it into a gist on Github for you and walk you through it, however, the Docker documentation should hopefully get you going

Why?

Well, larger applications / images aren't inherently bad, but with any increase in data, some operations may be slower.

When I say operations, I tend to mean pulling images from a registry, or pushing them to publish. It will take longer to transfer 1GB than it will 50MB.

There's also a consideration to be made when you scale your containers. While the image size does not necessarily correlate directly to how much disk you will use when you start a container, it will certainly increase the requirements for the machine you're running on, and limit others on smaller devices

Docker

The advantages of using Docker are widespread, and I can't cover them all here without submitting my writing for thesis defence ;-)

But it mainly boils down to the following points:

  • Alot of providers support running your applications in docker
  • Dockerfiles help you to build your application in a consisten environment, meaning you dont have to configure each host your app runs on, or worry about version clashes
  • Containers let you develop and run your application in a consistent (And the same) environment
  • Containers usually provide really nice networking capabilities. An example you will have encountered is within docker compose, you can reach other containers simply through their hostname

Nginx

You've set things up well there, from what I can gather! I imagine nginx is 'telling you' (Via the logs?) to navigate to 0.0.0.0 because that is what it will have bound to in the container. Now, you've forwarded traffic from 1337:80. Docker follows the format of host:container, so this means that traffic on localhost:1337 will be directed to the containers port 80. You may need to swap this around based on your nginx configuration, but rest assured you will be able to navigate to localhost in your browser and see your website once everything is set up

Let me know if you need help with any of the above, or want more resources to aid you. Happy to correspond and walk you through anything anytime given we seem to be in the same timezone πŸ€™

Preuss answered 5/8, 2021 at 1:18 Comment(8)
Thank you so much for your amazing reply. I really appreciate it! I'll look at copying over the files which don't include the venv! That was a newbie mistake. I'll also absolutely take you up on your offer of help. I really want to understand what I'm doing and ... mostly ... I'm getting there ... but still someway to go. Then need to work out how to make it all work with Digital Ocean! Thanks again :) – Chlamydate
No worries! I rearranged your file locally and managed to shave it down a fair amount, you can see the edit above under "How to". Not a "Here you go, I fixed it for you", but at the very least a "It's possible, give it a go!" Info on .dockerignore can be found here: docs.docker.com/engine/reference/builder/#dockerignore-file Best of luck! Not too familiar with DO myself, but I imagine they will have some sort of ingress offering for you which can terminate TLS: docs.digitalocean.com/products/accounts/security/certificates – Preuss
Again ... thanks so much. I've combined all the RUN commands and used the .dockerignore and the image size is now 528MB. Much better! I'm impressed you got it to 484MB!!! Still can't seem to get the ports to work. If I reverse the 80 and 1337 on the nginx nothing serves. I'll have a further scour of the web and see what I can come up with. You have been amazing. Thanks so much! – Chlamydate
Yours will likely be that bit larger due to the code being copied in, but a 50% reduction in size isn't bad for a couple of && statements huh! Having a re-read of your file and question, if you can successfully navigate to your site on localhost:1337, then all you need to do to serve it on 80 (Default web browser port) is change - 1337:80 to - 80:80, then requests to your local:80 will be directed to the nginx:80 in the container πŸ‘ – Preuss
I'm a quick study lol. I worked out the 80:80 and it works. Tried to make it work with DO and it's a bust for tonight but going to come at it fresh tomorrow morning. Stay safe! – Chlamydate
Unfortunately, Docker's best practices document is not ideal in a bunch of ways. For example, it recommends Alpine, which is fine for Go but very bad for Python (see my other answer). – Warrigal
Some great links in that answer, really clear explanation and comparisons between the two. Also worth considering πŸ‘ Though I'd contest that the best practices page is not helpful. It's a great start for a large amount of users with really good general advice. I'd argue Go isn't the only language that works well with Alpine (C# and Dotnet doing really well on alpine), and perhaps Python is the weird and wonderful exception – Preuss
musl (libc in Alpine) has also been known to have behavior differences from glibc, which can be a pain when you get an obscure bug. Go suffers from this least, since it doesn't use libc much at all. There are other problems with the best practices but, to be fair the most egregious, telling you to not apt-get upgrade, has been fixed. – Warrigal
W
8

Notice that you have to install a compiler. That compiler takes a lot of space.

Most Python packages include pre-compiled binary packages, so why do you need a compiler? Because you are using Alpine. Binary packages (==wheels) from PyPI don't work on Alpine.

So:

  1. Switch away from Alpine base image to e.g. python:3.8-slim-buster.
  2. Get rid of compiler install, and headers, and so on, you probably don't need any of it.
  3. Enjoy your new fast build, and quite possibly much smaller image.

Details: https://pythonspeed.com/articles/alpine-docker-python/

The alternative is a multi-stage build, where your final image doesn't include the unnecessary compiler. This adds more complexity of course.

Starting point for that (it's 3 article series): https://pythonspeed.com/articles/smaller-python-docker-images/

Warrigal answered 5/8, 2021 at 12:47 Comment(1)
Thanks so much! I'll try to change the base image and see what happens. I managed to follow the previous answer and reduced it to just under 600MB. Will see what I can do now. Sadly, this didn't help with DigitalOcean so still can't host it ... but will add that as a problem for solving today. – Chlamydate
S
0

For those that are here after trying everything and your docker image is still large, I will advice you ensure you check your log file if you setup one, in my case, I had to add the log file to the .dockerignore file after I discovered that the log file was 6.69GB of the django app docker image 7GB size. Doing this reduced the image size to about 0.3GB or 300MB from 7GB.

Please note that I'm still learning the concept of docker, images and containers.

Sapphirine answered 23/7, 2024 at 14:57 Comment(0)

© 2022 - 2025 β€” McMap. All rights reserved.