gitlab-runner kubernetes cache is ignored
Asked Answered
K

2

8

I have a kubernetes cluster with a gitlab-runner 10.3.0 and kubernetes executor. There is no cache_dir defined in the runner's config.toml-file. Note that this is different that a docker executor, so the volume-solutions do not apply.

In a .gitlab-ci.yml, I configured a job to use the cache:

build:
  cache:
    key: "${PROJECT_NAME}"
    paths:
      - "node_modules/"
  script:
    - ls node_modules/ || echo "cache not there"
    - npm i
    - npm build
    - ...

When I run this, I see the cache being pulled and created:

Cloning repository for some-branch with git depth set to 1...
Cloning into '/group/projectname'...
Checking out d03baa31 as some-branch...
Skipping Git submodules setup
Checking cache for projectname...
Successfully extracted cache
$ docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
//
// ...work being done here... 
//
Creating cache projectname...
node_modules/: found 24278 matching files
Created cache
Job succeeded

However, when I push another commit to this branch, the ls node_modules/ still does not find the cache.

I searched the documentation and did not find any information on how to activate the cache. The gitlab-runner-pod does not have any of the supposedly cached files there as well and according to the documentation, a cache_dir in the config is not used by the kubernetes executor.

But according to this feature page, the kubernetes executor does support cache.

So how to do this?

Klutz answered 7/2, 2018 at 18:13 Comment(1)
Possible duplicate of Docker caching of .m2 doesn't work locallyPrimogenitor
H
12

Due to the distributed nature of Kubernetes, you will need to configure a central cache location (typically, in the form of a S3-compatible object storage like AWS S3 or Minio). The reason behind this is explained in the Gitlab runner documentation (emphasis mine):

To speed up your builds, GitLab Runner provides a cache mechanism where selected directories and/or files are saved and shared between subsequent builds.

This is working fine when builds are run on the same host, but when you start using the Runners autoscale feature, most of your builds will be running on a new (or almost new) host, which will execute each build in a new Docker container. In that case, you will not be able to take advantage of the cache feature.

To overcome this issue, together with the autoscale feature, the distributed Runners cache feature was introduced.

It uses any S3-compatible server to share the cache between used Docker hosts. When restoring and archiving the cache, GitLab Runner will query the S3 server and will download or upload the archive.

For this, you can use the [runners.cache] section in the runner configuration:

[runners.cache]
  Type = "s3"
  ServerAddress = "s3.amazonaws.com"
  AccessKey = "AMAZON_S3_ACCESS_KEY"
  SecretKey = "AMAZON_S3_SECRET_KEY"
  BucketName = "runners"
  BucketLocation = "eu-west-1"
  Insecure = false
  Path = "path/to/prefix"
  Shared = false

Edit by OP: Installation instructions for Minio for gitlab-ci

Holder answered 7/2, 2018 at 20:52 Comment(3)
quite misleading to only talk about S3 - didn't know about the S3-compatible bucket Minio. Thanks! See: docs.gitlab.com/runner/install/…Klutz
The link for the Minio installation instruction changed: docs.gitlab.com/runner/install/… Otherwise for Kubernetes: docs.min.io/docs/deploy-minio-on-kubernetes.htmlHalloran
Too many edits pending to make the change, but the new URL for docs is here: docs.gitlab.com/charts/charts/minio/…Stealage
R
6

Another option to implement cache with kubernetes executor is using PVC. Usually, I am creating PVC in the gitlab-runner namespace using efs.csi.aws.com driver and access mode ReadWriteMany. Then in the config.toml:

 [[runners]]
      executor = "kubernetes"
      cache_dir = "/cache"
      ...
      [[runners.kubernetes.volumes.pvc]]
        name = "gitlab-runner-cache"
        mount_path = "/cache"
      ...

Then in the job logs you can see

Restoring cache
Checking cache for default-non_protected...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted. 
Successfully extracted cache

and test it with for example

$ cat test.txt
Sat Oct 15 08:23:12 UTC 2022
Sat Oct 15 08:24:09 UTC 2022
Sat Oct 15 08:24:57 UTC 2022
Sat Oct 15 08:25:52 UTC 2022
$ echo $(date) >> test.txt
Saving cache for successful job
Creating cache default-non_protected...
test.txt: found 1 matching files and directories   
No URL provided, cache will not be uploaded to shared cache server. Cache will be stored only locally. 
Created cache
Ravid answered 15/10, 2022 at 8:49 Comment(2)
Thank you! This seems to be a good option too. I'll try it in Azure.Needlefish
Update: GitLab Runner supports Azure Blob Storage natively now -> docs.gitlab.com/runner/configuration/…Needlefish

© 2022 - 2024 — McMap. All rights reserved.