Google Cloud Composer taking too long to install dependencies
Asked Answered
B

2

5

I'm following the documentation for Google Cloud Composer to install Python dependencies from PyPI in an environment. I used this command to install the libraries from a requirements file:

$ gcloud composer environments update $ENV_NAME \
    --update-pypi-packages-from-file requirements.txt \
    --location us-east4

It was just a test and this requirements file only has 4 libraries to install, but it takes more than 20 minutes to finish to execute this command. So I tried to use the user interface and install a single package from there, but it takes almost the same time.

Something is not making sense to me, when I execute these commands the environment enters in a "updating state" and takes several minutes to be ready again. Why does Composer take so long to perform a pip install?

Has anyone already faced a problem similar to that? How do you manage the installation of Python dependencies in Composer?

Butterfish answered 7/6, 2019 at 19:11 Comment(2)
What dependencies are you attempting to install? It's possible that one or more of them needs to be compiled from source.Orsa
They are "normal packages" like requests and urllib3, I tried to install them locally and it was very fast.Butterfish
C
7

The reason Cloud Composer environments take so long to update is because the service deploys Airflow in a distributed setup within Google Kubernetes Engine and App Engine (for the webserver). This means the service has to take care of building/rebuilding container images, redeploying them to your cluster, updating the webserver app, etc.

While this does mean the installation of packages or updates to the environment may take a bit of time, it's what makes Composer easy to use - providing you a one-shot equivalent to pip install even if you have dozens of worker nodes.

Christiechristin answered 9/6, 2019 at 0:7 Comment(3)
Ok that make a lot of sense I think. Like you said, for a big Compose environment with dozens of worker nodes. But for a small deployment on Composer in a "development phase", this time to rebuild every time I need a new dependency can be very annoying.Butterfish
AFAIU it is taking roughly 20mins for all kinds of update operations, and I have 4 of them for an idempotent deployment: clean-configs, update-configs, clean-pypi-packages, and update-pypi-packages. So it is taking roughly 80mins and this isn't just annoying...Tantalic
Maybe that would help to have a way to provide a snapshot of these variables to apply them all at once, but looks like there is no such a feature so far.Tantalic
L
0

Lock the pypi packages versions of your requirements.txt file. This way to save time to PIP to check all possible versions.

Pip solves your pypi dependencies with a backtracking strategy, which is very slow. It checks for every possible combination of versions. Fixing package versions in the requirements.txt prevents pip from using a backtracking strategy.

Check the build logs of your cloud composer instance, look at the final pypi dependencies versions installed and fix those versions into your requirements.txt

This way, the first deployment will always be the slowest (I mean before you identify the packages versions installed). Once you identify the packages versions it will be much faster.

Lurcher answered 22/9, 2022 at 14:44 Comment(1)
Even if you are only installing 4 dependencies, remember that cloud composer has over 30 dependencies installed as well. All those packages need to go through the backtracking strategy to solve dependency issues.Lurcher

© 2022 - 2024 — McMap. All rights reserved.