How can I persist a docker image instance between stages of a GitLab pipeline?
Asked Answered
F

1

14

In the last couple weeks I have been setting up my first pipeline using the public shared runners on GitLab.com for a php project in a private repository. The pipeline is pretty simple at this point, defining two stages:

stages:
  - test
  - deploy

The test stage runs composer update -o to build the project dependencies, connects to a remote database server, and runs the CodeCeption testing framework to test the build and generate code coverage reports.

The deploy stage runs composer update --no-dev -o to rebuild the project with only the production dependencies and uses rsync to push the files to the production webserver.

This is all working right now, but for each stage it runs the whole process of pulling the docker image, installing dependencies, and extracting the project from git. It seems like it would be a whole lot more efficient to just load the docker image and project once, then run the test and deploy stages one after the other using the same persistent build instance.

I realize that many times you do want to create a fresh instance for each stage, but with my project I feel like this is rather inefficient for time and server resources.

I could configure everything to run in the same stage, which would eliminate the redundant docker image process, but I would lose the pipeline functionality in GitLab where you can see which stages failed, and make later stages dependent on the success of the preceding ones.

enter image description here

From my review of the documentation and several related questions, it seems like this might have to do with the the architecture of how this process works, where jobs are independent of each other (and can even be processed by different runners) and are organized into stages on a pipeline.

What I have is certainly workable, (if a little slow) but I thought I would ask the question here in case there was something I was missing that would make this process more efficient while still retaining the CI pipeline functionality.

Finlay answered 21/6, 2019 at 15:12 Comment(4)
Could you share your full .gitlab-ci.yml? Also, checkout out Gitlab CI's cache and artifacts to skip the full dependency build on every execution. See hereTremolite
@Tremolite - Thanks for the input. I have reviewed the cache and artifacts, and do use the cache for the composer vendor folder, but the biggest time factor is pulling and running the docker image, not the actual "build" of my application. I was able to find a better image (thecodingmachine/php:7.3-v2-cli) which allowed me to avoid having to compile php extensions each time and saves several steps in my yml file. I am thinking this is probably the best I can do when using the public shared runners.Finlay
You could also build your own Docker image, based on the mentioned image above. If you install your custom packages etc. that might save some more time as well.Tremolite
It isn't possible to persist the current state of the docker image. The best you could do is cache a bunch of stuff that you need in the subsequent CI stage but that isn't any more performant. You do just need a docker image that has removed the duplicate steps each build is taking. Steps to do so hereBoldt
C
2

I know this is an old question, but want to provide an answer for anyone that has the same issue.

There's a config option for the Gitlab Runner application itself that controls when the runner will use a local copy of an image or not. If you manage and user your own runners (even if using gitlab.com) you have full control over these options, but if you use the shared runners provided by Gitlab, you cannot.

Here are the three "pull policies" you can use:

  1. Never. The never pull policy will instruct the runner to never pull images from Docker cloud or another repository, and will only use images already pulled to the Docker host. This allows full control over images and versions used by Gitlab.
  2. If Not Present. The if not present policy instructs the runner to first check if the image is available locally, and if so to use it. Otherwise, it will pull the image from it's repository.
  3. Always. The always policy instructs the runner to ignore any local images, and pull from the repository every time the job runs.

For the shared runners on gitlab.com, the pull policy is set to always to serve the needs of most users. The solution to this issue is to register your own runner(s) for your projects (which you can run in AWS EC2, your laptop/workstation, etc.

Here is the information on available configuration options when running your own Gitlab Runner.

Here are specific details on the available Pull Policies, and when to use them (or not to).

Here is how to register a runner to your projects (or to your entire instance if using self-hosted Gitlab).

Centigram answered 17/8, 2021 at 20:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.