Cache Cargo dependencies in a Docker volume
Asked Answered
C

7

13

I'm building a Rust program in Docker (rust:1.33.0).

Every time code changes, it re-compiles (good), which also re-downloads all dependencies (bad).

I thought I could cache dependencies by adding VOLUME ["/usr/local/cargo"]. edit I've also tried moving this dir with CARGO_HOME without luck.

I thought that making this a volume would persist the downloaded dependencies, which appear to be in this directory.

But it didn't work, they are still downloaded every time. Why?


Dockerfile

FROM rust:1.33.0

VOLUME ["/output", "/usr/local/cargo"]

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .
COPY src/ ./src/

RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

Built with just docker build ..

Cargo.toml

[package]
name = "mwe"
version = "0.1.0"
[dependencies]
log = { version = "0.4.6" }

Code: just hello world

Output of second run after changing main.rs:

...
Step 4/6 : COPY Cargo.toml .
---> Using cache
---> 97f180cb6ce2
Step 5/6 : COPY src/ ./src/
---> 835be1ea0541
Step 6/6 : RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
---> Running in 551299a42907
Updating crates.io index
Downloading crates ...
Downloaded log v0.4.6
Downloaded cfg-if v0.1.6
Compiling cfg-if v0.1.6
Compiling log v0.4.6
Compiling mwe v0.1.0 (/)
Finished dev [unoptimized + debuginfo] target(s) in 17.43s
Removing intermediate container 551299a42907
---> e4626da13204
Successfully built e4626da13204
Cryometer answered 1/3, 2019 at 21:59 Comment(4)
Can you post your Dockerfile and your docker build command?Figurine
@JackGore I've reduced it to a minimal example and added itCryometer
So do I understand properly that running cargo build -Z unstable-options --out-dir /output both build the thing and download the dependencies? And that, if the dependency folder is already filled, it won't re-download them?Vue
@b.enoit.be Yes that command downloads and builds everything. It doesn't work without project code though, so that must be either added (invalidating cache) or faked (which works but I'd ideally like to avoid that complexity).Cryometer
S
9

A volume inside the Dockerfile is counter-productive here. That would mount an anonymous volume at each build step, and again when you run the container. The volume during each build step is discarded after that step completes, which means you would need to download the entire contents again for any other step needing those dependencies.

The standard model for this is to copy your dependency specification, run the dependency download, copy your code, and then compile or run your code, in 4 separate steps. That lets docker cache the layers in an efficient manner. I'm not familiar with rust or cargo specifically, but I believe that would look like:

FROM rust:1.33.0

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .
RUN cargo fetch # this should download dependencies
COPY src/ ./src/

RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

Another option is to turn on some experimental features with BuildKit (available in 18.09, released 2018-11-08) so that docker saves these dependencies in what is similar to a named volume for your build. The directory can be reused across builds, but never gets added to the image itself, making it useful for things like a download cache.

# syntax=docker/dockerfile:experimental
FROM rust:1.33.0

VOLUME ["/output", "/usr/local/cargo"]

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .
COPY src/ ./src/

RUN --mount=type=cache,target=/root/.cargo \
    ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

Note that the above assumes cargo is caching files in /root/.cargo. You'd need to verify this and adjust as appropriate. I also haven't mixed the mount syntax with a json exec syntax to know if that part works. You can read more about the BuildKit experimental features here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

Turning on BuildKit from 18.09 and newer versions is as easy as export DOCKER_BUILDKIT=1 and then running your build from that shell.

Sikhism answered 12/3, 2019 at 13:20 Comment(2)
The first way problematic because Cargo currently lacks a way to only build dependencies. It also only caches whole layers. The second way with BuildKit sounds very promising, but I couldn't test it because even with the env flag and /etc/docker/daemon.jsoncan't get it to parse on 18.09.3.Cryometer
I got a lot of useful answers, but I think this one is most useful one: for being first explaining why anonymous volume didn't work, suggesting 'cargo fetch', and especially for what seems like it will be the best solution in half a year, even though I can't get it to work today.Cryometer
B
8

I would say, the nicer solution would be to resort to docker multi-stage build as pointed here and there

This way you can create yourself a first image, that would build both your application and your dependencies, then use, only, in the second image, the dependency folder from the first one

This is inspired by both your comment on @Jack Gore's answer and the two issue comments linked here above.

FROM rust:1.33.0 as dependencies

WORKDIR /usr/src/app

COPY Cargo.toml .

RUN rustup default nightly-2019-01-29 && \
    mkdir -p src && \
    echo "fn main() {}" > src/main.rs && \
    cargo build -Z unstable-options --out-dir /output

FROM rust:1.33.0 as application

# Those are the lines instructing this image to reuse the files 
# from the previous image that was aliased as "dependencies" 
COPY --from=dependencies /usr/src/app/Cargo.toml .
COPY --from=dependencies /usr/local/cargo /usr/local/cargo

COPY src/ src/

VOLUME /output

RUN rustup default nightly-2019-01-29  && \
    cargo build -Z unstable-options --out-dir /output

PS: having only one run will reduce the number of layers you generate; more info here

Bhili answered 13/3, 2019 at 1:28 Comment(5)
That's a smart trick too, thanks. Better than removing the fake source files. Though I'd still prefer to avoid the echo "fn main() {}" > src/main.rs (especially since my real project is a workspace with multiple Cargo.toml's)Cryometer
You are a little bit stuck then. If you use your real main.rs in the dependencies image, then changing it will invalidate all subsequent cached layers, so the RUN command will rebuild your dependancies. This said: have you tested the cargo fetch command as pointed by @SikhismVue
Good point, I'd missed the fetch thing, actually. It does cache the downloads, but not compiling, so saves some time but it's limited.Cryometer
Ah then indeed. Cargo is definitively missing a cargo install command alla npm, yarn, composer-php, thenVue
True, probably not for too much longer though :-) github.com/rust-lang/cargo/issues/2644Cryometer
C
3

Here's an overview of the possibilities. (Scroll down for my original answer.)

  • Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards remove the fake source and add the real ones. [Caches dependencies, but several fake files with workspaces].
  • Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards create a new layer with the dependencies and continue from there. [Similar to above].
  • Externally mount a volume for the cache dir. [Caches everything, relies on caller to pass --mount].
  • Use RUN --mount=type=cache,target=/the/path cargo build in the Dockerfile in new Docker versions. [Caches everything, seems like a good way, but currently too new to work for me. Executable not part of image. Edit: See here for a solution.]
  • Run sccache in another container or on the host, then connect to that during the build process. See this comment in Cargo issue 2644.
  • Use cargo-build-deps. [Might work for some, but does not support Cargo workspaces (in 2019)].
  • Wait for Cargo issue 2644. [There's willingness to add this to Cargo, but no concrete solution yet].
  • Using VOLUME ["/the/path"] in the Dockerfile does NOT work, this is per-layer (per command) only.

Note: one can set CARGO_HOME and ENV CARGO_TARGET_DIR in the Dockerfile to control where download cache and compiled output goes.

Also note: cargo fetch can at least cache downloading of dependencies, although not compiling.

Cargo workspaces suffer from having to manually add each Cargo file, and for some solutions, having to generate a dozen fake main.rs/lib.rs. For projects with a single Cargo file, the solutions work better.


I've got caching to work for my particular case by adding

ENV CARGO_HOME /code/dockerout/cargo
ENV CARGO_TARGET_DIR /code/dockerout/target

Where /code is the directory where I mount my code.

This is externally mounted, not from the Dockerfile.

EDIT1: I was confused why this worked, but @b.enoit.be and @BMitch cleared up that it's because volumes declared inside the Dockerfile only live for one layer (one command).

Cryometer answered 8/3, 2019 at 22:10 Comment(3)
Your original idea do not work because volume in docker are wiped out between each layers and more importantly when docker creates a container out of an image (when you docker run or docker exec it). So your trial at making a volume out of the dependency folder is actually what makes it fail to achieve what you intended it for. Containers are disposable (you should be able to destroy one and run your code on another, maybe even with another rust version in it), that’s what volume are for; have a consistent data layer when/if you want to remove one container and reuse your data in a new oneVue
That was also explain in @Sikhism answer btwVue
@b.enoit.be Yeah that was my misunderstanding, thanks to you and BMitch for clearing that up. External volume is a solution I might go with, though I'd prefer the dockerfile to be more self-contained...Cryometer
F
2

You do not need to use an explicit Docker volume to cache your dependencies. Docker will automatically cache the different "layers" of your image. Basically, each command in the Dockerfile corresponds to a layer of the image. The problem you are facing is based on how Docker image layer caching works.

The rules that Docker follows for image layer caching are listed in the official documentation:

  • Starting with a parent image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.

  • In most cases, simply comparing the instruction in the Dockerfile with one of the child images is sufficient. However, certain instructions require more examination and explanation.

  • For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.

  • Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match.

Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used.

So the problem is with the positioning of the command COPY src/ ./src/ in the Dockerfile. Whenever there is a change in one of your source files, the cache will be invalidated and all subsequent commands will not use the cache. Therefore your cargo build command will not use the Docker cache.

To solve your problem it will be as simple as reordering the commands in your Docker file, to this:

FROM rust:1.33.0

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .

RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

COPY src/ ./src/

Doing it this way, your dependencies will only be reinstalled when there is a change in your Cargo.toml.

Hope this helps.

Figurine answered 3/3, 2019 at 0:12 Comment(2)
Thanks for the answer! I explored caching inside the image, but since Cargo doesn't have a way to build dependencies only (plugin doesn't work for workspaces), it doesn't work very well (the build command in your example fails for that reason). That's why I wanted to cache the data in a volume...Cryometer
Perhaps I should just adapt your example and add a bunch of RUN mkdir -p src && echo "fn main() {}" > src/main.rs for every workspace member. I just thought the volume thing was gonna be cleaner and easier...Cryometer
S
2

With the integration of BuildKit into docker, if you are able to avail yourself of the superior BuildKit backend, it's now possible to mount a cache volume during a RUN command, and IMHO, this has become the best way to cache cargo builds. The cache volume retains the data that was written to it on previous runs.

To use BuildKit, you'll mount two cache volumes, one for the cargo dir, which caches external crate sources, and one for the target dir, which caches all of your built artifacts, including external crates and the project bins and libs.

If your base image is rust, $CARGO_HOME is set to /usr/local/cargo, so your command looks like this:

RUN --mount=type=cache,target=/usr/local/cargo,from=rust,source=/usr/local/cargo \
    --mount=type=cache,target=target \
    cargo build

If your base image is something else, you will need to change the /usr/local/cargo bit to whatever is the value of $CARGO_HOME, or else add a ENV CARGO_HOME=/usr/local/cargo line. As a side note, the clever thing would be to set literally target=$CARGO_HOME and let Docker do the expansion, but it doesn't seem to work right - expansion happens, but buildkit still doesn't persist the same volume across runs when you do this.

Other options for achieving Cargo build caching (including sccache and the cargo wharf project) are described in this github issue.

Sathrum answered 7/1, 2020 at 17:35 Comment(4)
> I'd probably just choose a separate host dir to map cargo to though, so as to decouple the image from anything I might do with Rust on the host. I'm not sure what you're saying here, but those buildkit cache volumes aren't bound to host directories. They start from empty.Sathrum
Oh, nevermind then. I though the extra arguments compared to BMitch's answer were for host, but they're for build stageCryometer
That's right, and they're necessary in this case. The reason I posted a separate answer here is because all of the other posts about --mount are hypothetical, but in practice never worked. Every single argument in my solution is required to actually achieve caching.Sathrum
In later versions of docker/buildkit anyway, this answer's exact code snippet does not work. Specifically, due to using a cache mount for the target folder, the binary produced from the build is left in the cache as well, meaning it is inaccessible to later steps (eg. RUN ls target). To fix, you must move the produced binary outside of the cached target folder prior to that step's completion, as seen here: ectobit.com/blog/rust-container-image-buildkit-buildxAdao
J
1

I'm sure you can adjust this code for use with a Dockerfile, but I wrote a dockerized drop-in replacement for cargo that you can save to a package and run as ./cargo build --release. This just works for (most) development (uses rust:latest), but isn't set up for CI or anything.

Usage: ./cargo build, ./cargo build --release, etc

It will use the current working directory and save the cache to ./.cargo. (You can ignore the entire directory in your version control and it doesn't need to exist beforehand.)

Create a file named cargo in your project's folder, run chmod +x ./cargo on it, and place the following code in it:

#!/bin/bash

# This is a drop-in replacement for `cargo`
# that runs in a Docker container as the current user
# on the latest Rust image
# and saves all generated files to `./cargo/` and `./target/`.
#
# Be sure to make this file executable: `chmod +x ./cargo`
#
# # Examples
#
# - Running app: `./cargo run`
# - Building app: `./cargo build`
# - Building release: `./cargo build --release`
#
# # Installing globally
#
# To run `cargo` from anywhere,
# save this file to `/usr/local/bin`.
# You'll then be able to use `cargo`
# as if you had installed Rust globally.
sudo docker run \
    --rm \
    --user "$(id -u)":"$(id -g)" \
    --mount type=bind,src="$PWD",dst=/usr/src/app \
    --workdir /usr/src/app \
    --env CARGO_HOME=/usr/src/app/.cargo \
    rust:latest \
    cargo "$@"
Jackijackie answered 22/11, 2019 at 20:10 Comment(0)
A
1

I figured out how to get this also working with cargo workspaces, using romac's fork of cargo-build-deps.

This example has my_app, and two workspaces: utils and db.

FROM rust:nightly as rust

# Cache deps
WORKDIR /app
RUN sudo chown -R rust:rust .
RUN USER=root cargo new myapp

# Install cache-deps
RUN cargo install --git https://github.com/romac/cargo-build-deps.git

WORKDIR /app/myapp
RUN mkdir -p db/src/ utils/src/

# Copy the Cargo tomls
COPY myapp/Cargo.toml myapp/Cargo.lock ./
COPY myapp/db/Cargo.toml ./db/
COPY myapp/utils/Cargo.toml ./utils/

# Cache the deps
RUN cargo build-deps

# Copy the src folders
COPY myapp/src ./src/
COPY myapp/db/src ./db/src/
COPY myapp/utils/src/ ./utils/src/

# Build for debug
RUN cargo build
Assumed answered 2/9, 2020 at 15:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.