How to enable / setup Dependency Caches for apt-get on BitBucket Pipelines
Asked Answered
T

4

24

I am using the following code in my bitbucket-pipelines.yml files to remotely deply code to a staging server.

image: php:7.1.1

pipelines:
  default:
    - step:
        script:
          # install ssh
          - apt-get update && apt-get install -y openssh-client
          # get the latest code
          - ssh [email protected] -F ~/.ssh/config "cd /path/to/code && git pull"
          # update composer
          - ssh [email protected] -F ~/.ssh/config "cd /path/to/code && composer update --no-scripts"
          # optimise files
          - ssh [email protected] -F ~/.ssh/config "cd /path/to/code && php artisan optimize"

This all works, except that each time the pipeline is run, the ssh client is downloaded and installed everything (adding ~30 seconds to the build time). Is there way I can cache this step?

And how can I go about caching the apt-get step?

For example, would something like this work (or what changes are needed to make the following work):

pipelines:
  default:
    - step:
        caches:
          - aptget
        script:
          - apt-get update && apt-get install -y openssh-client

definitions:
  caches:
    aptget: which ssh
Terpsichorean answered 30/8, 2017 at 13:40 Comment(0)
S
25

This is a typical scenario where you should use your own Docker image instead of one of the ones provided by Atlassian. (Or search for a Docker image which provides exactly this.)

In your simple case, this Dockerfile should be enough:

FROM php:7.1.1

RUN apt-get update && \
    apt-get install -y openssh-client

Then, create a DockerHub account, publish the image and reference it in bitbucket-pipelines.yml.

Solis answered 30/8, 2017 at 14:58 Comment(2)
Using an image which provides what you need is great, but IMHO creating a whole new docker image that will need to be kept up to date isn't worth it to shave 30 sec off of a build.Keen
That depends. When running the pipeline frequently, this can be much more beneficial than keeping an image up to date. On the other hand: not everybody is using a pipeline to create (for instance) Docker images which are used in production, but there are enough people – including me – who primarily use it to run unit tests, perform linting, code style checks etc. In this case, keeping the image up to date is often not critical.Solis
U
8

Unfortunately, the parts that take the time are unsafe or pointless to cache. Remember that the pipeline caches may be deleted at any time, so you always need to run the commands anyway.

apt-get update doesn't use a cache, so will download the latest indexes every time.

apt-get install caches downloaded packages in /var/cache/apt so you could save that. However this probably won't actually save any time

Fetched 907 kB in 0s (998 kB/s)

The actual installed packages cannot be cached, because they a) are spread around multiple shared files and directories and b) may not be portable to different docker images.

At a deeper level, satisfactory interaction between caching, apt-get update, and Docker is a complex issue.

Uribe answered 15/5, 2018 at 9:51 Comment(1)
apt-get update does use a cache: it is /var/lib/apt/lists and it speeds up updates. As for /var/cache/apt, in a docker image it will always be empty unless you tweak /etc/apt/apt.conf.d/*. Please see https://mcmap.net/q/548637/-how-to-enable-setup-dependency-caches-for-apt-get-on-bitbucket-pipelines instead.Graphic
P
3

I am using a similar configuration, but in my case I want to cache the gettext package, came here for the same reason (to find how to cache gettext).

if you don't have that dependency, you can use the bitbucket provided ssh pipe pipe: atlassian/ssh-run. don't have to create custom docker image.

image: atlassian/default-image:2

pipelines:
  branches:
    develop:
      - step:
          deployment: staging
          script:
              - apt update && apt install -y gettext
              - envsubst < scripts/deploy.sh > deploy-out.sh
              - pipe: atlassian/ssh-run:0.2.6
                variables:
                  SSH_USER: $STAGE_USER
                  SERVER: $STAGE_SERVER
                  COMMAND: 'deploy-out.sh'
                  MODE: 'script'
Pomerleau answered 6/8, 2020 at 4:8 Comment(0)
G
3

TL;DR

  1. Avoid apt. Use apt-get
  2. rm /etc/apt/apt.conf.d/docker-clean
  3. Cache /var/lib/apt/lists/
  4. Cache /var/cache/apt/
image: debian

definitions:

  caches:
    apt-lists: /var/lib/apt/lists
    apt-cache: /var/cache/apt

  yaml-anchors:
    - &debian-setup-script >-
        rm /etc/apt/apt.conf.d/docker-clean
        && apt-get update
        && apt-get install --yes <the-packages>

pipelines:

  default:
    - step:
        caches:
          - apt-lists
          - apt-cache
        script:
          - *debian-setup-script
          - do your thing

Long story:

  1. By default, apt install would autoclean downloaded .deb files unless the installation failed. Avoid it, use apt-get install instead. https://askubuntu.com/a/794987

  2. Most debian-like docker images you will encounter will probably stem from https://github.com/debuerreotype/debuerreotype . They have sensible optimizations to reduce image layer sizes for the bulk people. But our situation in bitbucket-pipelines is totally different: a populated cache is welcome and will be restored in subsequent executions. Remove any stuff in /etc/apt/apt.conf.d/* that might be autocleaning APT's cache, namely /etc/apt/apt.conf.d/docker-clean

  3. apt-get update will litter the /var/lib/apt/lists folder. Keep its contents! Future update instructions will reach the remote repositories anyway (and they should) but there will be no download if your lists are fresh.

  4. /var/cache/apt is the well-known APT cache folder. Keep it!


Trying to cache final installed files is useless and close to nonsense. Generally, the binary placed by a package will rely on a bunch of libraries and files of the same and other packages that will have been spread on the OS folder tree. Also, those files being present will not speed-up update nor install instructions in any way.

This means you are still bound to actually install the packages and run any post-installation script, potentially even having to build some sources. This is equivalent to how a npm or pip cache would work and is totally fine.


If you only want to ssh a remote machine, you should consider the answer by @Rashi https://mcmap.net/q/548637/-how-to-enable-setup-dependency-caches-for-apt-get-on-bitbucket-pipelines instead.

Graphic answered 13/7, 2022 at 0:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.