Convenient way to cache Node.js dependencies in GitLab CI/CD
Asked Answered
P

2

10

I'm using npm to install Node.js dependencies in my project. I want to cache Node.js packages (node_modules) globally to speed up jobs in pipelines when deploying to Heroku. An example from the official docs of GitLab:

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
  - .npm/

before_script:
  - npm ci --cache .npm --prefer-offline

And here is another example in GitLab:

cache:
  paths:
    - node_modules/

Found some articles (Deploy Node.js App with GitLab CI/CD, Continuous Integration with Node.js, Heroku and GitLab CI/CD -Part 2) that used the second configuration above. I did give it a shot and I was able to deploy my app to Heroku with these settings successfully. But I'm not sure caching mechanism is working properly.

What is the difference between these configurations? Which one is the most convenient way to cache Node.js packages?

My current setup for gitlab-ci.yml file:

image: node:latest

cache:
  paths:
    - node_modules/

stages:
  - build
  - deploy

build:
  stage: build
  script:
    - npm i
    - npm i -g gulp-cli
    - gulp build

deploy:
  image: ruby:latest
  stage: deploy
  script:
    - apt-get update -qy
    - apt-get install -y ruby-dev
    - gem install dpl
    - dpl --provider=heroku --app=$HEROKU_APP_NAME --api-key=$HEROKU_API_KEY
  only:
    - master

Not sure if I'm doing it the right way.

Podium answered 20/6, 2020 at 20:1 Comment(0)
C
9

It depends if you want to use npm install, aka npm i, or if you want to use npm ci.

npm install will look first for an existent node_modules folder and will re-use it. If not, will fetch the dependencies. Check the full algorithm.

npm ci instead, removes an existing node_modules folder to perform a clean install of the dependencies. From the docs:

In short, the main differences between using npm install and npm ci are:

  • The project must have an existing package-lock.json or npm-shrinkwrap.json.
  • If dependencies in the package lock do not match those in package.json, npm ci will exit with an error, instead of updating the package lock.
  • npm ci can only install entire projects at a time: individual dependencies cannot be added with this command.
  • If a node_modules is already present, it will be automatically removed before npm ci begins its install.
  • It will never write to package.json or any of the package-locks: installs are essentially frozen.

Some tests having ~/.npm and node_modules populated:

$ npm i --prefer-offline
#...
updated 2 packages in 17.472s

$ rm -rf ~/.npm/ # removes global npm cache
$ npm i --prefer-offline
#...
up to date in 16.271s # removing npm cache does not affects to npm i

$ rm -rf node_modules/
$ npm i --prefer-offline
#...
added 2525 packages from 1197 contributors in 55.388s # removing node_modules affetcs to npm i
$ npm ci --prefer-offline
#...
updated 2 packages in 17.201s

$ rm -rf ~/.npm/ # removes global npm cache
$ npm ci --prefer-offline
#...
added 2532 packages in 48.362s # removing npm cache affects to npm ci

$ rm -rf node_modules/
$ npm ci --prefer-offline
#...
added 2532 packages in 18.695s # removing node_modules does not affetcs to npm ci

So ending up, npm ci has CI oriented features which can be interesting to use with but if there are no benefits for you just cache node_modules and use npm install instead.

Corody answered 22/7, 2020 at 2:40 Comment(4)
Thanks for detailed answer. OK, I get now the different between npm i and npm ci. But please look at my current setup (gitlab-ci.yml file), which one is the most convenient in my situation? Or do I even need to cache it? Because I have only 2 stages. I use node_modules in the first stage.Hornpipe
These "performance benchmarks" are just wrong, as they demonstrate a lack of understanding. I'll critique one case to drive my point home -- In your last comparison, you remove node_modules and state it doesn't affect npm ci's speed. The reason is because your previous run of npm ci cached the remote dependencies, thus all npm had to do was read from the cache to recreate node_modules.Daugherty
Do not take me wrong @JordanCaras but is a lack of understanding to think my test is about performance. It demostrate that the real difference is about immutability and the target cache each of the commands will use so it helps to understand which is more convenient.Corody
I think I get what you're doing here. The comment in your first example claims deleting the npm cache does not affect npm install. That's only the case because node_modules still exists in your scenario. If you had to recreate both, you'd see a runtime hit because the npm cache is used when creating node_modules regardless of the command used. You seem to be illustrating how npm ci removes node modules as it's first step... fair enough. The comments just seem to declare things that aren't quite true in a box.Daugherty
D
1

You should be using npm ci over npm install regardless of the solution you choose.

In the GitLab recommended approach:

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
  - .npm/

before_script:
  - npm ci --cache .npm --prefer-offline

You are caching all of the files that npm needs to install your project (and effectively create node_modules). Subsequent jobs will run before_script (npm ci) and re-generate node_modules from the cache. This will be fast if the cache exists.

The main advantage of this method is resilience. The cache is truly optional. If it doesn't download properly or doesn't exist because of some TTL (maybe you are re-running a job), your pipeline stage using npm ci will fetch all the necessary files from the internet. The operative argument is --prefer-offline

You can code up a way to be resilient using the latter approach, but it'd be more complicated.

EDIT To amend my answer. I suspect the latter approach is faster, as you don't need to rebuild node_modules/ with each stage. Although I haven't verified this.

Daugherty answered 14/11, 2022 at 16:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.