How to avoid reinstalling dependencies for each job in Gitlab CI
Asked Answered
S

7

40

I'm using Gitlab CI 8.0 with gitlab-ci-multi-runner 0.6.0. I have a .gitlab-ci.yml file similar to the following:

before_script:
  - npm install

server_tests:
  script: mocha

client_tests:
  script: karma start karma.conf.js

This works but it means the dependencies are installed independently before each test job. For a large project with many dependencies this adds a considerable overhead.

In Jenkins I would use one job to install dependencies then TAR them up and create a build artefact which is then copied to downstream jobs. Would something similar work with Gitlab CI? Is there a recommended approach?

Sundown answered 4/11, 2015 at 15:54 Comment(1)
I customized my own docker image with what I need. Works for you?Linnea
S
19

Update: I now recommend using artifacts with a short expire_in. This is superior to cache because it only has to write the artifact once per pipeline whereas the cache is updated after every job. Also the cache is per runner so if you run your jobs in parallel on multiple runners it's not guaranteed to be populated, unlike artifacts which are stored centrally.


Gitlab CI 8.2 adds runner caching which lets you reuse files between builds. However I've found this to be very slow.

Instead I've implemented my own caching system using a bit of shell scripting:

before_script:
  # unique hash of required dependencies
  - PACKAGE_HASH=($(md5sum package.json))
  # path to cache file
  - DEPS_CACHE=/tmp/dependencies_${PACKAGE_HASH}.tar.gz
  # Check if cache file exists and if not, create it
  - if [ -f $DEPS_CACHE ];
    then
      tar zxf $DEPS_CACHE;
    else
      npm install --quiet;
      tar zcf - ./node_modules > $DEPS_CACHE;
    fi

This will run before every job in your .gitlab-ci.yml and only install your dependencies if package.json has changed or the cache file is missing (e.g. first run, or file was manually deleted). Note that if you have several runners on different servers, they will each have their own cache file.

You may want to clear out the cache file on a regular basis in order to get the latest dependencies. We do this with the following cron entry:

@daily               find /tmp/dependencies_* -mtime +1 -type f -delete
Sundown answered 3/12, 2015 at 17:33 Comment(3)
I'm using a different approach, with a ln -s command on backup directory before_script to node_modules and an rm node_modules in after_script. This is much faster than a gitlab artifact or a zip. + using gitlab environment on_stop you can now delete backup directory when branch is deleted.Bartram
How does this work if you bump the node version from 6 to 8 for example? I'm guessing this will fail. If you have engines set accordingly in package.json, it will however work.Heaven
zipping the node_modules folder is still faster than using artifacts or caching. Artifacts uploads the entire contents to GitLab. Caching is still slow. GitRunner seems creep on "Reinitialized existing Git repository" at times taking up to 30 seconds on a small project.Kirov
B
10

EDIT: This solution was recommended in 2016. In 2021, you might consider the caching docs instead.

A better approach these days is to make use of artifacts.

In the following example, the node_modules/ directory is immediately available to the lint job once the build stage has completed successfully.

build:
  stage: build
  script:
    - npm install -q
    - npm run build
  artifacts:
    paths:
      - node_modules/
  expire_in: 1 week

lint:
  stage: test
  script:
    - npm run lint
Buttock answered 27/9, 2016 at 7:35 Comment(6)
The artifacts show the download options on the pipeline page. Can we avoid it?Red
I don't believe so, although this issue may be worth following, gitlab.com/gitlab-org/gitlab-ce/issues/29757Buttock
DO NOT USE ARTIFACTS FOR CACHING! Use cache: docs.gitlab.com/ee/ci/caching Thumbs downnnTypewrite
As mentioned by @danger89, GitLab recommend using artifacts instead of caching for this use case.Sezen
Indeed: cache: key: ${CI_COMMIT_REF_SLUG} paths: - .npm/ before_script: - npm ci --cache .npm --prefer-offline Ps. Also avoid caching node_modules folder, instead cache .npm folder. And prefer offline cache.Typewrite
expire_in needs one more indentationHujsak
P
10

From docs:

  • cache: Use for temporary storage for project dependencies. Not useful for keeping intermediate build results, like jar or apk files. Cache was designed to be used to speed up invocations of subsequent runs of a given job, by keeping things like dependencies (e.g., npm packages, Go vendor packages, etc.) so they don’t have to be re-fetched from the public internet. While the cache can be abused to pass intermediate build results between stages, there may be cases where artifacts are a better fit.

  • artifacts: Use for stage results that will be passed between stages. Artifacts were designed to upload some compiled/generated bits of the build, and they can be fetched by any number of concurrent Runners. They are guaranteed to be available and are there to pass data between jobs. They are also exposed to be downloaded from the UI. Artifacts can only exist in directories relative to the build directory and specifying paths which don’t comply to this rule trigger an unintuitive and illogical error message (an enhancement is discussed at https://gitlab.com/gitlab-org/gitlab-ce/issues/15530 ). Artifacts need to be uploaded to the GitLab instance (not only the GitLab runner) before the next stage job(s) can start, so you need to evaluate carefully whether your bandwidth allows you to profit from parallelization with stages and shared artifacts before investing time in changes to the setup.

So, I use cache. When don't need to update de cache (eg. build folder in a test job), I use policy: pull (see here).

Placeeda answered 21/6, 2019 at 16:3 Comment(0)
A
3

I prefer use cache because removes files when pipeline finished.

Example

image: node

stages:
 - install
 - test
 - compile

cache:
 key: modules
 paths:
  - node_modules/

install:modules:
 stage: install
 cache:
  key: modules
  paths:
    - node_modules/
  after_script:
   - node -v && npm -v
  script:
  - npm i

test:
 stage: test
 cache:
   key: modules
   paths:
     - node_modules/
   policy: pull
 before_script:
  - node -v && npm -v
 script:
- npm run test

compile:
 stage: compile
 cache:
 key: modules
 paths:
   - node_modules/
 policy: pull
 script:
  - npm run build
Ava answered 14/8, 2018 at 20:16 Comment(0)
M
1

Solved a problem with a symbolic link to a folder outside the working directory. The solution looks like this:

//.gitlab-ci.yml
before_script:
  - New-Item -ItemType SymbolicLink -Path ".\node_modules" -Target "C:\GitLab-Runner\cache\node_modules"
  - yarn

after_script:
  - (Get-Item ".\node_modules").Delete()

I know this is a enough dirty solution but it saves a lot of time for build process and extends the storage life.

Mullite answered 8/6, 2021 at 2:46 Comment(0)
O
0

I think it´s not recommended because all jobs of the same stage could be executed in parallel.

  1. First all jobs of build are executed in parallel.
  2. If all jobs of build succeeds, the test jobs are executed in parallel.
  3. If all jobs of test succeeds, the deploy jobs are executed in parallel.
  4. If all jobs of deploy succeeds, the commit is marked as success.
  5. If any of the previous jobs fails, the commit is marked as failed and no jobs of further stage are executed.

I have read that here:

http://doc.gitlab.com/ci/yaml/README.html

Odine answered 4/11, 2015 at 16:10 Comment(2)
Yes but couldn't you have one build stage job that installs the dependencies then any number of test stage jobs that use those same files?Sundown
In that case, I suppose you can do it, but I don´t know if you will find some problem with installed dependencies before. An option could be to define a bash script and run this bash in your test (- sh script.sh) and then you can manage installations inside the bash.Odine
C
0

GitLab introduced caching to avoid redownloading dependencies for each job.

The following Node.js example is inspired from the caching documentation.

image: node:latest

# Cache modules in between jobs
cache:
  key: $CI_COMMIT_REF_SLUG
  paths:
    - .npm/

before_script:
  - npm ci --cache .npm --prefer-offline

server_tests:
  script: mocha

client_tests:
  script: karma start karma.conf.js

Note that the example uses npm ci. This command is like npm install, but designed to be used in automated environments. You can read more about npm ci in the documentation and the command line arguments you can pass.

For further information, check Caching in GitLab CI/CD and the cache keyword reference.

Connie answered 20/12, 2021 at 13:15 Comment(1)
pushing/pulling cache takes a lot of time for large projects.Wieldy

© 2022 - 2024 — McMap. All rights reserved.