Slow numpy and pandas imports on Google Cloud Run
Asked Answered
V

2

7

I'm developping an API and deploying it on Google Cloud Run.

There is a prestart python script that import pandas and numpy. When I time the imports numpy take about 2 seconds and pandas about 4 seconds on Cloud Run as opposed to less than 0.5 second on my local machine.

I'm using python:3.8-alpine as my base image in order to build my docker container. (Though I have tried several non Alpine images... )

Here is the Dockerfile

FROM python:3.8-alpine

COPY requirements.txt ./

RUN apk add --no-cache --virtual build-deps g++ gcc gfortran make libffi-dev openssl-dev file build-base \
    && apk add --no-cache libstdc++ openblas-dev lapack-dev \ 
    && pip install --no-cache-dir uvicorn gunicorn fastapi \
    && CFLAGS="-g0 -Wl,--strip-all -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" \
    && pip install --no-cache-dir --compile --global-option=build_ext --global-option="-j 16" -r requirements.txt \
    && rm -r /root/.cache \
    && find /usr/local/lib/python3.*/ -name 'tests' -exec rm -r '{}' + \
    && find /usr/local/lib/python3.*/site-packages/ \( -type d -a -name test -o -name tests \) -o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) -exec rm -r '{}' + \
    && find /usr/local/lib/python3.*/site-packages/ -name '*.so' -print -exec /bin/sh -c 'file "{}" | grep -q "not stripped" && strip -s "{}"' \; \
    && find /usr/lib/ -name '*.so' -print -exec /bin/sh -c 'file "{}" | grep -q "not stripped" && strip -s "{}"' \; \
    && find /usr/local/lib/ -name '*.so' -print -exec /bin/sh -c 'file "{}" | grep -q "not stripped" && strip -s "{}"' \; \
    && rm -rf /usr/local/lib/python*/ensurepip \
    && rm -rf /usr/local/lib/python*/idlelib \
    && rm -rf /usr/local/lib/python*/distutils/command \
    && rm -rf /usr/local/lib/python*/lib2to2 \
    && rm -rf /usr/local/lib/python*/__pycache__/* \
    && rm -r /requirements.txt /databases.zip \
    && rm -rf /tmp/* \
    && rm -rf /var/cache/apk/* \
    && apk del build-deps g++ gcc make libffi-dev openssl-dev file build-base 

CMD ["python","script.py"]

requirements.txt :

numpy==1.2.0
pandas==1.2.1

and the execution python file script.py :

import time

ts = time.time()
import pandas
te = time.time()
print(te-ts)

Are these slow imports to be expected? Or perhaps there is some python import trick ?

I have been looking all over stackoverflow and github issues but nothing similar to this "issue"/"behavior".

Thanks in advance.

Vastitude answered 4/2, 2021 at 14:8 Comment(5)
Maybe your Google Cloud Run instance (?) is just slow? How many CPU cores does it provide? What percentage of the CPU can you use? How much RAM does it have?Robledo
Hi, thanks for taking of your time. I have tried several configuration from 1 CPU core to 4 and from 512 MB to 8GB of RAM. CPU percentage remains low < ~ 10%Vastitude
Hi,because the functions are stateless, the execution environment is often initialized from scratch known as a cold start. Using the lazy initialization for infrequently used objects to defer the time cost and decrease cold start times. refer here : cloud.google.com/blog/topics/developers-practitioners/…. I hope this will help you!Majuscule
@Vastitude were you able to find a solution to this issue?Epiblast
Unfortunately, no.. but I can say it has nothing to do with machines cores nor RAM.Vastitude
D
4

This is a known issue in the Python ecosystem.

all modules are imported at runtime, and some modules are 300-500MB large in size

There are tons of complaints about slow import times. The best thread is this one: improving speed of Python module import

As regarding Cloud Run, I have experimented with various ways, and nothing was able to reduce the slowness drastically.

If you want to use in serverless environment, or in other cold start ecosystem,
be aware the cold start can be in order of 10 seconds magnitude because of the "imports".

 importing pandas took 1.42 seconds
 importing numpy took 1.90 seconds
 importing torch took 2.84 seconds 
 importing torchvision took 0.78 seconds
 importing IPython took 1.22 seconds
 importing sklearn took 1.51 seconds
 importing import dask took 0.74 seconds

Try 1:

No solution with bumping up the CPU to maximum possible

Try 2:

No speed improvement with rewrite imports as:

pd = imp.load_module("pandas",None,"/usr/local/lib/python3.10/site-packages/pandas",('','',5))

this way the interpreter skips the "finding" phase, but still timing of this is the same, so there is no speed improvement over this.

Try 3:

No benefits of using install requirements with compile

RUN python -m pip install --no-cache-dir --compile -r requirements-prod.txt
RUN python -m compileall .

I even explored the container and the __pycache__ was built for all the modules and app code as well, but no improvement over the cold start time.

Summary:

A good read about a lazy load proposal is here

Demolition answered 18/7, 2022 at 19:51 Comment(3)
Thanks for yout time! I'll give this lazy loading a look!Vastitude
It's been a few years but I'm seeing the same thing here, and I don't think it's the imports. I have functions with the same imports that are lightning fast, and when I print to the log it is very clearly the pd.read_parquet that is slow. From my local machine I can read a parquet from gcp in 200ms but from a cloud function on the same location it's taking 4-5 seconds when I thought it would be even faster.Yuk
Observing 4 seconds "from google.cloud import compute" duration when executed in Cloud Run instance (with python 3.10), with 1 CPU. It has definitely something to do with that google's compute environment, not sure if there is any throttling in place while Cloud Run initializes instance (startup).Tallia
F
0

Imports are slow, some are very slow. This is not exactly an answer to the question but allows you to at least quantify the problem. The idea is to temporarily "hack" the Python module importer. Put this code before you import anything.

import importlib
import time
import builtins

original_import = builtins.__import__

def timed_import(name, *args, **kwargs):
    start_time = time.time()
    result = original_import(name, *args, **kwargs)
    duration = time.time() - start_time
    if duration > 0.01:  # Only log imports that take more than 10 ms             
    print(f"Importing {name} took {duration:.4f} seconds")
    return result

# Override the built-in import function with our timed import function
builtins.__import__ = timed_import

then somewhere else in your code put

 builtins.__import__ = original_import                       #end of timed import

In my case the vast majority of imports were a small number of microseconds, but a few stood out.

    Importing pandas took 1.2163 seconds
    Importing pandas.core.api took 0.8030 seconds
    Importing flask took 0.4840 seconds
    Importing pandas._libs took 0.3502 seconds
I   Importing pandas._libs.interval took 0.3396 seconds
    Importing google.cloud took 0.3394 seconds
    Importing serving took 0.2843 seconds
    Importing google.cloud.datastore took 0.2836 seconds
    Importing google.cloud.datastore.batch took 0.2836 seconds
    Importing pandas._libs.hashtable took 0.2679 seconds
    Importing pandas.core.groupby took 0.2362 seconds
    Importing pandas.compat took 0.2269 seconds 
    Importing pandas._libs.missing took 0.2164 seconds
    Importing pyarrow took 0.2067 seconds
    Importing pandas.compat.pyarrow took 0.2067 seconds
    Importing pandas._libs.tslibs.nattype took 0.2058 seconds
    Importing numpy took 0.1947 seconds
    Importing pandas.core.arrays took 0.1756 seconds
    Importing pandas._libs.tslibs.conversion took 0.1647 seconds
    Importing pandas.core.frame took 0.1645 seconds
    Importing numpy._core._multiarray_umath took 0.1541 seconds
    Importing numpy.__config__ took 0.1541 seconds 
    Importing numpy._core._multiarray_umath took 0.1237 seconds
    Importing pandas._libs.tslibs.offsets took 0.1235 seconds 
    Importing pandas.core.arrays.arrow took 0.1232 seconds
    Importing _ssl took 0.1288 seconds
    Importing ssl took 0.1288 seconds

Many of these are only required by specific parts of the user product so should be great opportunities for threaded or lazy loading.

Felon answered 13/10 at 2:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.