Gitlab CI: npm doesn't like the cached node_modules
Asked Answered
M

5

52

The internet is full of complains about Gitlab not caching, but in my case I think, that Gitlab CI indeed caches correctly. The thing is, that npm seems to install everything again anyway.

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - vendor/
    - bootstrap/
    - node_modules/

build-dependencies:
  image: ...
  stage: build
  script:
  - cp .env.gitlab-testing .env
  - composer install --no-progress --no-interaction
  - php artisan key:generate
  - npm install
  - npm run prod
  - npm run prod
  artifacts:
    paths:
    - vendor/
    - bootstrap/
    - node_modules/
    - .env
    - public/mix-manifest.json
  tags:
  - docker

This is my gitlab-ci.yml file (well.. the relevant part). While the cached composer dependencies are used, the node_modules aren't. I even added everything to cache and artifacts out of desperation..

Mini answered 26/3, 2019 at 14:37 Comment(0)
F
94

Updated Answer (Dec 2, 2023, GitLab@^15.12 & >13)

Just like the comments received, the use of artifacts is not ideal in the original answer, but was the cleanest approach that worked reliably. Now, that GitLab's documentation has been updated around the use of cache and it was also expanded to support multiple cache keys per job (4 maximum unfortunately), there is a better way to handle node_modules across a pipeline.

The rationale for implementation is based on understand the quirks of both GitLab and how npm works. These are the fundamentals:

  1. NPM recommends the use of npm ci instead of npm install when running in a CI/CD environment. FYI, this will require the existence of package-lock.json, which is used to ensure 0 packages are automatically version bumped while running in a CI environment (npm i by default will not create the same deterministic build every time, such as on a job re-run).

  2. npm ci deliberately removes the entirety of node_modules first before re-installing all packages listed in package-lock.json. Therefore, it is best to only configure GitLab to run npm ci once and ensure the resulting node_modules is passed to other jobs.

  3. NPM has its own cache that it stores at ~/.npm/ in case of offline builds and overall speed. You can specify a different cache location with the --cache <dir> option (you will need this). (variation of @Amityo's answer)

  4. GitLab cannot cache any directory outside of the repository! This means the default cache directory ~/.npm cannot be cached.

  5. GitLab's global cache configuration is applied to every job by default. Jobs will need to explicit override the cache config if it doesn't need the globally cached files. Using YAML anchors, the global cache config can be copied & modified.

  6. To run additional npx or npm run <script> without re-running an install, you should cache node_modules/ folder(s) across the pipeline.

  7. GitLab's expectation is for users to use the cache feature to handle dependencies and only use artifacts for dynamically generated build results. This answer now supports this desire better than possible before. There is the restriction that artifacts should be less than the maximum artifact size or 1GB (compressed) on GitLab.com. And artifacts uses your storage usage quota

  8. The use of the needs or dependencies directives will influence if artifacts from a previous job will be downloaded (or deleted) automatically on the next job.

  9. GitLab cache's can monitor the hash value of a file and use that as the key so its possible the cache will only update when the package-lock.json updates. You could use package.json but you would invalidate your deterministic builds as it does not update when minor or patches are available.

  10. If you have a mono-repo and have more than 2 separate packages, you will hit the cache entry limit of 4 during the install job. You will not have the ideal setup but you can combine some cache definitions together. It is also worth noting, GitLab cache.key.files supports a maximum of 2 files to use for the key hash so you likely will need to use another method to determine a useful key. One likely solution will be to use a non-file related key and cache all node_modules/ folders under that key. That way you have only 2 cache entries for the install job and 1 for each subsequent job.

  11. GitLab's cache list unfortunately prepends the index value to the cache key it uses (this is very unintuitive!) which can affect whether or not your cache is matched between jobs. Thank you to @deanharber for bringing this up. As of 3 Dec 2023, this example has been updated to reflect the new order in the install job. The global cache entry must be first in the install job since the global cache key is actually 0_[CACHE_KEY_FOR_NODE_MODULES], and the npm cache second as it will have a key of 1_[CACHE_KEY_FOR_NPM_TARBALLS].

Solution

  • Run a single install job as .pre stage, using cached downloaded packages (tar.gz's) across entire repository.
  • Cache all node_modules/ folders to following jobs for that pipeline execution. Do not allow any jobs except install to upload cache (lowers pipeline run time & prevents unintended consequences)
  • Pass the build/ directory via artifacts on to other jobs only when needed
# .gitlab-ci.yml

stages:
  - build
  - test
  - deploy


# global cache settings for all jobs
# Ensure compatibility with the install job
# goal: the install job loads the cache and
# all other jobs can only use it
cache:
    # most npm libraries will only have 1 entry for the base project deps
    - &global_cache_node_mods
      key:
          files:
              - package-lock.json
      paths:
          - node_modules/
      policy: pull  # prevent subsequent jobs from modifying cache

#   # ATTN mono-repo users: with only additional node_modules, 
#   # add up to 2 additional cache entries. 
#   # See limitations in #10. 
#   - key:
#         files:
#             - core/pkg1/package-lock.json
#     paths:
#         - core/pkg1/node_modules/
#     policy: pull # prevent jobs from modifying cache


install:
  image: ...
  stage: .pre   # always first, no matter if it is listed in stages
  cache:
    # Mimic &global_cache_node_mods config but override policy
    # to allow this job to update the cache at the end of the job
    # and only update if it was a successful job (#5)
    - <<: *global_cache_node_mods
       when: on_success
       policy: pull-push

#   # ATTN Monorepo Users: add additional key entries from 
#   # the global cache and override the policy as above but
#   # realize the limitations (read #10).
#   - key:
#      files:
#        - core/pkg1/package-lock.json
#      paths:
#        - core/client/node_modules/
#      when: on_success
#      policy: pull-push

    # store npm cache for all branches (stores download pkg.tar.gz's)
    # will not be necessary for any other job
    - key: ${CI_JOB_NAME}
       # must be inside $CI_PROJECT_DIR for gitlab-runner caching (#3)
       paths:
         - .npm/
       when: on_success
       policy: pull-push

# before_script:
#   - ...
  script:
    # define cache dir & use it npm!
    - npm ci --cache .npm --prefer-offline
#   # monorepo users: run secondary install actions
#   - npx lerna bootstrap -- --cache .npm/ --prefer-offline


build:
  stage: build
  # global cache settings are inherited to grab `node_modules`
  script:
    - npm run build
  artifacts:
    paths:
      - dist/           # where ever your build results are stored


test:
  stage: test
  # global cache settings are inherited to grab `node_modules`
  needs:
    # install job is not "needed" unless it creates artifacts
    # install job also occurs in the previous stage `.pre` so it
    # is implicitly required since `when: on_success` is the default
    # for subsequent jobs in subsequent stages
    - job: build
      artifacts: true      # grabs built files
  # dependencies: could also be used instead of needs
  script:
    - npm test


deploy:
  stage: deploy
  when: on_success # only if previous stages' jobs all succeeded
  # override inherited cache settings since node_modules is not needed
  cache: {}
  needs:
    - job: build
      artifacts: true      # grabs dist/
  script:
    - npm publish

GitLab's recommendation for npm can be found in the GitLab Docs.


[DEPRECATED] Original Answer (Oct 27, 2021, GitLab<13.12)

All the answers I see so far give only half answers but don't actually fully accomplish the task of caching IMO.

In order to fully cache with npm & GitLab, you must be aware of the following:

  1. See #1 above

  2. npm ci deliberately removes the entirety of node_modules first before re-installing all packages listed in package-lock.json. Therefore, configuring GitLab to cache the node_modules directory between build jobs is useless. The point is to ensure no preparation hooks or anything else modified node_modules from a previous run. IMO, this is not really valid for a CI environment but you can't change it and maintain the fully deterministic builds.

  3. See #3-#4 above

  4. If you have multiple stages, global cache will be downloaded every job. This likely is not what you want!

  5. To run additional npx commands without re-running an install, you should pass the node_modules/ folder as an artifact to other jobs.

[DEPRECATED] Solution

  • Run a single install job as .pre stage, using cached downloaded packages (tar.gz's) across entire repository.
  • Pass node_modules & the build directory on to other jobs only when needed

stages:
  - build
  - test
  - deploy

install:
  image: ...
  stage: .pre         # always first, no matter if it is listed in stages
  cache:
    key: NPM_DOWNLOAD_CACHE  # a single-key-4-all-branches for install jobs
    paths:
      - .npm/
  before_script:
    - cp .env.gitlab-testing .env
    - composer install --no-progress --no-interaction
    - php artisan key:generate
  script:
    # define cache dir & use it npm!
    - npm ci --cache .npm --prefer-offline
  artifacts:
    paths:
    - vendor/
    - bootstrap/
    - node_modules/
    - .env
    - public/mix-manifest.json

build:
  stage: build
  needs:
    - job: install         
      artifacts: true       # true by default, grabs `node_modules`
  script:
    - npm run build
  artifacts:
    paths:
      - dist/               # whereever your build results are stored

test:
  stage: test
  needs:
    - job: install
      artifacts: true      # grabs node_modules
    - job: build
      artifacts: true      # grabs built files
  script:
    - npm test

deploy:
  stage: deploy
  needs:
      # does not need node_modules so don't state install as a need
    - job: build
      artifacts: true      # grabs dist/
    - job: test            # must succeed
      artifacts: false     # not needed
  script:
    - npm publish
Forever answered 27/10, 2021 at 20:8 Comment(25)
Why did you use artifacts for node_modules though?Oviform
@Sagivb.g, because of the caching downfall (#5), so instead if it is an artifact you can control if it is downloaded in the following jobs because it is not always needed. Secondly, I only want node_modules to exist for the current pipeline, not across pipelines, branches, etc. You can set the timeout so the artifact only exists for 12 hrs for .pre stage (I do this, but not included above) if you are worried about space.Forever
You can set a different cache policy for jobs that doesn't need to pull or push the cache. Like pull or push only for certain jobs and no cache at all to prevent certain jobs using it. As for making it only available for a given pipeline, i guess you can provide the CI_PIPELINE_ID as the key. If your current setup works good for you then that's great, i just thought this info should be out there just in case :)Oviform
@Sagivb.g, I am unaware of this functionality for cache adjustments. Thanks for bringing it up :). If there's an alternative then it would be worth submitting an edit to include the alternative as I don't think our 2 solutions would be understandable/complementary in the same example. Thanks!Forever
Thanks for taking the time to write this. I wish gitlab would give more explicit advice. This whole thing is kind of confusing for me. I work in multiple gitlab project, some on SaaS, some self hosted. I have serious issues on SaaS gitlab with shared runners, uploading artifact takes FOREVER. On self-hosted, the speed is acceptable. I think overall, caching/artifacting node_modules should be avoided altogether, at least when you're on shared runners, cache .npm and do npm ci before each job instead. I think they suggest this in the link you shared.Skewback
If someone has suggestions to speed up artifact upload of node_modules on shared runners, I'm all ears. It won't perform for me and I've tried all the solutions such as FF_USE_FASTZIP. They didn't make any difference for me.Skewback
@boy, thank you for bringing up your issues with artifacts. To be clear, I don't particularly like doing it through artifacts but that is how i got it working consistently. You could try the cache policy modification @Sagivb.g recommends and let us know. The nice thing about .npm cache will prevent need of downloading outside of the cache so npm ci only unpacks. If you have to run it every time, you could speed npm ci up with --no-fund & --no-audit options.Forever
I think the network performance of SaaS runners was downgraded. I've read that cache goes to google API while artifacts hit the server. I notice a slow connection with other webservices too. I believe artifacting node_modules on shared runners is a no-go. For caching, I remember that while cache download was fast, upload is also slow, so cache policy might help. npm ci with a cache hit takes around 2 minutes for my monorepo. A startup time of 2 min o nevery job is fine for pipelines with not too many stages/jobs. When I have more jobs starting up I'll take a look at caching node_modules.Skewback
@boy, Im looking into a more modern use of cache policy since GitLab has since added some new features and docs on it since I originally wrote this answer, and will update if it works.Forever
I get npm not found on the build jobGoogol
@NikhilNanjappa, did you specify the image in the build job or specify image at the root of the config. I left it off as it is not relevant to the solution. If you are missing npm then you are missing node.Forever
@Forever - I think I had the image on the root levelGoogol
More or less unrelated question, but can we cache the composer dependencies in a way similar to the .npm/ folder?Wenwenceslaus
@BasvonBassadin, extremely likely however I am unfamiliar with how composer works and I only left it in the solution as it was in the original question. If you use the rules I laid out above for how GitLab works and apply it to composer you will likely find success. otherwise I recommend creating a separate question to answer composer separately.Forever
@Sagivb.g & \@boy, answer is now updated to use cache! Thank you for the comments.Forever
I had to slightly adapt the solution of Jul 30, 2022. Using the provided pipeline, the stages consuming the cache (e.g., build) would only download the cached ./npm folder, not node_modules/. This is because the global cache settings specify only one cache. Even though this is the node_modules/ cache, the stages would check out the wrong cache because the .npm/ cache is listed first in the install stage. Long story short: Either you remove the .npm/ cache from the install job, or you add it to the global cache settings, too. Still, thanks, @codejedi365, for this solution!Cruller
@Xantipus, do you have a reference of this issue? Your explanation seems unclear as the point of the global cache is there is only one for all the other jobs and install is the only one that is different because it overrides its job cache definition. if we remove the .npm/ cache then the pipeline would download the tarballs every pipeline run.Forever
If we rerun the pipline on another branch, does the cache of node modules will be extracted ?Goble
In my setup the order of the cache keys defined in the install job had to flip. Subsequent jobs were trying to retrieve the cache from 0_[CACHE_KEY_FOR_NODE_MODULES] where the install job created 1_[CACHE_KEY_FOR_NODE_MODULES]Aran
I had to combine @Cruller and @Aran comments to make it work. this means: add the .npm cache to the global cache and make sure that the order of the keys etc. in the global cache and the install stage is the same (e.g. start with .npm and then node_modules in both stages).Planetary
regarding the ordering issue mentioned by @Aran and @Planetary there is another workaround: apply a hardcoded prefix: to the cache, so that the automatic index based prefix is not generated. Reference: gitlab.com/gitlab-org/gitlab/-/issues/384390Moxley
@deanharber, thank you for reminding me of this problem where the cache key adds an index. I have updated the solution to re-order the cache entries to fix this problem. \@cpc, I haven't looked into prefix but that would totally be better. Thanks!Forever
@famas23, to your question back in March, if the pipeline is run on another branch, the npm cache is re-used, but node_modules/ is always rebuilt for the lifetime of the pipeline. It is always rebuilt due to npm ci, see point #2.Forever
Wouldnt it be even better to use the CI_COMMIT_REF_SLUG as cache key for node_modules since they shouldn't be shared with next pipelines. They can only be shared with jobs within the same pipeline. Reason for this is that otherwise you'll get node_modules from previous run which will need to be deleted before new dependencies can be installed. At least that's how I believe npm ci works.Pisa
I would really advice NOT to cache node_modules directory since the additional time for zipping & upload & downloading this cache is not worth it alone (GitLab docs itself only mention to cache .npm). So just cache the .npm folder should be sufficient in combination with npm ci --cache .npm --prefer-offline. Whether you want to use the package-lock.json file or CI_COMMIT_REF_SLUG as cache key is up to your own preference.Destructionist
S
19

Actually it should work, your cache is set globally, your key refers to the current branch ${CI_COMMIT_REF_SLUG}...

This is my build and it seems to cache the node_modules between the stages.

image: node:latest

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
  - node_modules/
  - .next/

stages:
  - install
  - test
  - build
  - deploy

install_dependencies:
  stage: install
  script:
    - npm install

test:
  stage: test
  script:
    - npm run test

build:
  stage: build
  script:
    - npm run build

Scalage answered 20/6, 2019 at 15:21 Comment(4)
I had an install stage too until I found this gem: before_script: - if [ ! -d "node_modules" ]; then npm install; fi hope it works for ya!Hunchback
slightly shorter maybe [ -d "node_modules" ] || npm install ?Familial
@Familial This fails. Unescaped || are interpreted as a multiline block scalar indicator.Postal
shouldn't the cache be specified only in the first "install" stage?Hopfinger
M
6

I had the same issue, for me the problem was down to the cache settings, by default the cache does not keep unversioned git files and since we do not store node_modules in git the npm files were not cached at all. So all I had to do was insert one line "untracked: true" like below

cache:
  untracked: true
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - vendor/
    - bootstrap/
    - node_modules/

Now npm is faster, although it still needs to check if things have changed, for me this still takes a couple of minutes, so I considering having a specific job to do the npm install, but it has sped things up a lot already.

Margarite answered 1/7, 2020 at 10:10 Comment(2)
Are you sure? The documentation for cache:paths seems pretty clear that it will include anything in those paths regardless of whether they are tracked or not. docs.gitlab.com/ee/ci/yaml/#cachepathsGoodhumored
@MatrixFrog, at the time of writing I had noticed a big time saving by enabling the option. I cannot tell you more though :)Margarite
C
2

The default cache path is ~/.npm

To set the npm cache directory:

npm config set cache <path> --global

see here for more information

Cookshop answered 26/3, 2019 at 21:10 Comment(1)
While this is totally not the answer to the OP's question, it still helped me. For npm-noobs: node_modules is the finished product of npm install. Like a python virtualenv maybe? I found, that my solution should be putting the npm-cache-dir into the CI-cache. Not the node_modules.Grosvenor
B
0

GitLab + NPM/Yarn Cache + React Firebase Hosting

stages:
  - test
  - build
  - deploy

default:
  image: node:21
  cache: # Cache modules in between jobs
    key:
      files:
        - yarn.lock
    paths:
      - node_modules/

##########################
# Firebase Preview Links #
##########################
preview_deploy:
  stage: test
  image: node:21
  # only:
  #   - merge_requests
  rules:
    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && ($CI_MERGE_REQUEST_TARGET_BRANCH_NAME == "develop")
  before_script:
    - npm install -g firebase-tools
  script:
    - yarn install --immutable --immutable-cache
    - yarn build
    - |
      echo "{\"commit\": \"https://gitlab.com/ORG/REPO/-/commit/${CI_COMMIT_SHA}\", \"ref\": \"${CI_COMMIT_REF_NAME}\", \"job\": \"https://gitlab.com/ORG/REPO/-/jobs/${CI_JOB_ID}\"}" > dist/build.json
    - firebase --project "${FIREBASE_PROJECT_ID}" --token "${FIREBASE_TOKEN}" hosting:channel:deploy "${CI_COMMIT_SHA}"
  environment:
    name: preview-staging

Buttery answered 28/3 at 12:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.