How to make lightweight docker image for python app with pipenv
Asked Answered
H

5

19

I can produce working image for my python app with following simple Dockerfile:

FROM python:3.7
WORKDIR /myapp
COPY Pipfile* ./
RUN pip install pipenv
RUN pipenv install --system --deploy
COPY src .
CMD ["python3", "app.py"]

However, it will produce ~1 GB image, which can contain temporary files, and is heavy to deploy. And I only need full python image for building purposes. My app can successfully run on alpine variant, so I can make two-pass Dockerfile:

FROM python:3.7 as builder
COPY Pipfile* ./
RUN pipenv lock --requirements > requirements.txt
RUN python3 -m venv /venv
RUN /venv/bin/pip install --upgrade pip
RUN /venv/bin/pip install -r requirements.txt

FROM python:3.7-alpine
COPY --from=builder /venv /venv
WORKDIR /myapp
COPY src .
CMD ["/venv/bin/python3", "app.py"]

So far so good, it also works, being 6 times smaller. But this scheme was considered as some "stub", having some drawbacks:

  • It has unnesesary extra COPY --from=builder step
  • It does not utilizes pipenv but needs also pip for installing (+1 extra step, pipenv lock+pip install is always slower than just pipenv install)
  • It does not install system-wide, but into /venv, which is to be avoided inside a container
  • Minor: Build pollutes intermediate-images cache more, and requires downloading both image variants..

How to combine these two approaches, to get lightweitht alpine-based image with pipenv, lacking mentioned drawbacks?

Or can you offer your production Dockerfile ideas?

Hampshire answered 9/10, 2019 at 8:31 Comment(1)
H
20

The problem comes when you need things like ciso8601, or some libraries, requiring build process. Build tools are not "incorporated" into the both slim and alpine variants, for low-size footprint.

So to install deps, you'll have to:

  • Install build tools
  • Deploy dependencies from Pipfile.lock system-wide
  • Uninstall build tools and clean caches

And do that 3 actions inside a single RUN layer, like following:

FROM python:3.7-slim

WORKDIR /app

# both files are explicitly required!
COPY Pipfile Pipfile.lock ./

RUN pip install pipenv && \
  apt-get update && \
  apt-get install -y --no-install-recommends gcc python3-dev libssl-dev && \
  pipenv install --deploy --system && \
  apt-get remove -y gcc python3-dev libssl-dev && \
  apt-get autoremove -y && \
  pip uninstall pipenv -y

COPY app ./

CMD ["python", "app.py"]
  • Manipulating build system would cost you around 300MiB and some extra time
  • Uninstalling pipenv would save you another 20MiB (which is 10% of resulting size).
  • Separating RUN commands would not delete data from layers, and would result in ~500MiB image. That's docker specifics.

So that would result in perfectly working ~200MiB sized image, which is

  • 5 times less than original python:3.7, (that is >1.0GiB)
  • Has no alpine incompabilities (these are typically tied to glibc replacement)

At the time, we're fine with slim (debian buster) build variants, preferring slim over alpine (for most compatibility). If you're really up to further size optimization, I'd recommend you to take a look at some excellent builds of these guys:

Hampshire answered 20/11, 2019 at 12:20 Comment(1)
Alpine generate lesser size result.Airdrome
H
8

How about,

FROM python:3.7-alpine

WORKDIR /myapp

COPY Pipfile* ./

RUN pip install --no-cache-dir pipenv && \
    pipenv install --system --deploy --clear

COPY src .
CMD ["python3", "app.py"]
  1. It utilises the smaller Alpine version.
  2. You won't have any unnecessary cache files left over using --no-cache-dir option for pip and --clear option for pipenv.
  3. You also deploy outside of venv.

You can also add && pip uninstall pipenv -y after pipenv install --system --deploy --clear in the same RUN command to eliminate space taken by pipenv if that extra image size bothers you.

Hydrogenous answered 9/10, 2019 at 10:21 Comment(3)
pipenv install would fail, as (in my particular case) it requires full python:3.7, for building several packages.. As you may see, it is second step (Am I a fool, using 2-step build, if I can use 1-step?) Btw, thanks for some useful command-line parameters!Hampshire
pipenv won’t fail as we are installing pipenv before we invoke it. As for whether you are a fool, how do you think you look making that passive aggressive statement to someone who took time to reply to your question?Hydrogenous
In that particular case pipenv install would fail. It required full python:3.7, for building several packages, as well as some libraries that are missing in alpine images. And install-uninstall just leads to slow builds. Anyway we've abandoned and deprecated python developement, in favor of golang. And now we are using fast builds and megabyte-sized images - no need to mess with outdated entry-level slow-working interpreter tool anymore in a modern world. All python code is gone from our projects. Anyway, thanks for your reply.Hampshire
M
3

I am using micropipenv for the job, which describes itself as

A lightweight wrapper for pip to support requirements.txt, Pipenv and Poetry lock files or converting them to pip-tools compatible output. Designed for containerized Python applications but not limited to them.

An image created from it would look like the following. Since the alpine base image lacks a toml parser we have to use the version of micropipenv that includes the toml extras (micropipenv[toml] instead of micropipenv).

FROM python:3.9-alpine

WORKDIR /myapp
COPY Pipfile Pipfile.lock ./

RUN \
  # Install dependencies
  && pip install --no-cache-dir micropipenv[toml] \
  && micropipenv install --deploy \
  && pip uninstall -y micropipenv[toml]

COPY src .
CMD ["python3", "app.py"]
Macon answered 29/12, 2021 at 18:10 Comment(0)
Y
2

It has unnecessary extra COPY --from=builder step

That directive is harmless and actually making your final stage image even more lightweight: only the virtualenv is copied, no building toolchains nor cached wheels nor even pipenv in the final stage!

It does not utilizes pipenv but needs also pip for installing (+1 extra step, pipenv lock+pip install is always slower than just pipenv install)

Generate the virtualenv with pipenv in the building stage!

FROM python:3 as builder
RUN pip install pipenv
COPY Pipfile* /
WORKDIR /
RUN mkdir /.venv  # The presence of a .venv folder triggers pipenv to use it by default
RUN pipenv install --deploy

FROM python:3-slim
COPY --from=builder /.venv /.venv
ENV PATH=/.venv/bin:$PATH  # Activate virtualenv!
WORKDIR /myapp
COPY src .
CMD ["python3", "app.py"]

It does not install system-wide, but into /venv, which is to be avoided inside a container

While not using venvs inside docker is a common practice, there are still some benefits to them. And absolute 0 drawbacks. Stop listening to people saying venvs should not be used inside dockers. Pipenv current recommendation is to not issue system-wide installs in containers https://github.com/pypa/pipenv/pull/2762

Minor: Build pollutes intermediate-images cache more, and requires downloading both image variants..

Simply optimize your caches setup in the CI system.

BUT

For the love of God use the same platform for the building and final stages.

Either

  • use python:3-alpine for both stages, bloating the building stage with as many apk packages as you need
  • settle for python:3 for building and python:3-slim for the final stage. It is not that big

Alpine images use musl instead of libc and that means a different ABI for python packages https://peps.python.org/pep-0656/. Do not mix alpine images with non-alpine images just as you would not mix python:3.A images with a different python:3.B.

Otherwise some of the components installed in the building stage will be unusable by the final stage.

Yaelyager answered 27/10, 2022 at 20:59 Comment(1)
This is a very solid method and it eliminates files that would be left over from the build process even after uninstalling dependencies and packages that were added for the build step. I am using this for a production deployment.Krystin
U
0

My solution does not use pipenv, however, it builds a really lightweight base image using an intermediate Ubuntu stage to install system dependencies and later copy the output to the final python image.

My application needs to run on debian based ubuntu OS with python3.7.

Though I'm still exploring, the application runs without issues.

Without using Ubuntu to install system dependencies, the output image from python:3.7-slim was more than 700-800 MB, after this, it is barely 350-400 MB.

# Build Python packages through an intermediate stage based on ubuntu.
FROM ubuntu:20.04 as builder
RUN apt-get update -y
ARG DEBIAN_FRONTEND=noninteractives

# Install system dependencies required to install a few python packages
# These could be different based on the python package you want to install
RUN apt-get install wget -y
RUN apt-get install libtool build-essential autoconf automake pkg-config libtool-bin -y
RUN apt-get update && apt-get install -y cmake python3-dev

# In my case, I need the python packages `fb-re2` & `zeromq`
RUN apt-get install -y libre2-dev
# Install system dependency for `libzmq`
WORKDIR /zeromq
RUN wget -O zeromq-4.3.2.tar.gz https://github.com/zeromq/libzmq/releases/download/v4.3.2/zeromq-4.3.2.tar.gz &&\
    tar -xzf zeromq-4.3.2.tar.gz
WORKDIR /zeromq/zeromq-4.3.2
RUN ./autogen.sh && ./configure && make && make install


# Update package lists and install necessary dependencies
RUN apt-get update && apt-get install -y \
    software-properties-common \
    && add-apt-repository ppa:deadsnakes/ppa

# Install Python 3.7 and pip
RUN apt-get update && apt-get install -y \
    python3.7 \
    python3.7-dev \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Set python3.7 as the default python
RUN ln -s /usr/bin/python3.7 /usr/bin/python
RUN apt-get update && apt-get install python3.7-distutils -y
RUN apt-get install locate tcpdump -y
RUN python -m pip install --no-binary=:all: pyzmq==18.0.2
RUN python -m pip install fb-re2

# Install a couple of other python libraries
RUN python -m pip install --no-binary=:all: psutil==5.6.7
RUN python -m pip install --no-binary=:all: netifaces==0.11.0

### Final image build ###
# Build the base image for python based applications like a webserver.
FROM python:3.7.16-slim-buster

### From the intermediate build stage, copy the generated outputs to our final python image
COPY --from=builder /usr/local/lib/libzmq* /usr/local/lib
COPY --from=builder /usr/local/lib/python3.7/dist-packages/zmq/ /usr/local/lib/python3.7/site-packages/zmq
COPY --from=builder /usr/local/lib/python3.7/dist-packages/pyzmq-18.0.2.egg-info/ /usr/local/lib/python3.7/site-packages/pyzmq-18.0.2.egg-info

COPY --from=builder /usr/local/lib/python3.7/dist-packages/psutil/ /usr/local/lib/python3.7/site-packages/psutil
COPY --from=builder /usr/local/lib/python3.7/dist-packages/netifaces* /usr/local/lib/python3.7/site-packages/

RUN apt-get update && apt-get install -y libre2-dev
COPY --from=builder /usr/local/lib/python3.7/dist-packages/*re2* /usr/local/lib/python3.7/site-packages/
COPY --from=builder /usr/local/lib/python3.7/dist-packages/fb_re2-1.0.7.dist-info/ /usr/local/lib/python3.7/site-packages/fb_re2-1.0.7.dist-info
## END of copying from intermediate ubuntu stage

WORKDIR /my/work/dir/
COPY requirements.txt .

# Install other python packages through requirements.txt
# It will disregard any package installation failure
# Logs can later be found in `/var/log/python_package_installation.log` inside the container
RUN set -e && cat requirements.txt | xargs -n 1 pip install > /var/log/python_package_installation.log 2>&1 || true

EXPOSE 18000
# Let's run a simple python server, so that we can go to the
# container's shell and debug ourselves
CMD python3 -m http.server

Urana answered 25/7, 2023 at 16:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.