Docker downloads newer image for supposedly-cached digest
Asked Answered
T

4

6

My Dockerfile has this first step:

FROM python:3.6.10@sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11

The goal of this is to "lock" or "pin" the version of the image.

For a while, docker build correctly used the cached version:

Step 1/2 : FROM python:3.6.10@sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11
 ---> 114ae8bdb954

But after some time, it decided to "download a newer image":

Step 1/2 : FROM python:3.6.10@sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11
sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11: Pulling from library/python
7e2b2a5af8f6: Pulling fs layer
09b6f03ffac4: Pulling fs layer
dc3f0c679f0f: Pulling fs layer
fd4b47407fc3: Pulling fs layer
bb7b28578995: Pulling fs layer
6ebea4a9a306: Pulling fs layer
22a2327cd1ca: Pulling fs layer
bfbf91c84bbe: Pulling fs layer
f6b29b259c5c: Pulling fs layer
09b6f03ffac4: Verifying Checksum
09b6f03ffac4: Download complete
dc3f0c679f0f: Download complete
7e2b2a5af8f6: Verifying Checksum
7e2b2a5af8f6: Download complete
6ebea4a9a306: Verifying Checksum
6ebea4a9a306: Download complete
fd4b47407fc3: Verifying Checksum
fd4b47407fc3: Download complete
bfbf91c84bbe: Verifying Checksum
bfbf91c84bbe: Download complete
f6b29b259c5c: Verifying Checksum
f6b29b259c5c: Download complete
22a2327cd1ca: Verifying Checksum
22a2327cd1ca: Download complete
bb7b28578995: Verifying Checksum
bb7b28578995: Download complete
7e2b2a5af8f6: Pull complete
09b6f03ffac4: Pull complete
dc3f0c679f0f: Pull complete
fd4b47407fc3: Pull complete
bb7b28578995: Pull complete
6ebea4a9a306: Pull complete
22a2327cd1ca: Pull complete
bfbf91c84bbe: Pull complete
f6b29b259c5c: Pull complete
Digest: sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11
Status: Downloaded newer image for python@sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11
 ---> 114ae8bdb954

even though the final hash of this step is identical:

 ---> 114ae8bdb954

As I understand, digests (sha256:...) are immutable.
So are they mutable after all?
Or was the cached version somehow deleted?
What's going on and how do I fix it?

Tresatrescha answered 3/5, 2020 at 9:19 Comment(8)
Is it doing this for every run?Elodea
@Elodea upon closer inspection it seems that the first run of every day doesn't use cache, but the rest do. If the cache is cleaned every day, how can I configure it otherwise?Tresatrescha
Which system are you running this on? (Win, Mac, Linux)Elodea
@Elodea On a docker:1.11 container.Tresatrescha
@Elodea Correcting that not all of the rest use cache. I thought I found a pattern but it behaves nondeterministically.Tresatrescha
So this is docker inside docker scenario? Is this a long running container or you do a volume mount?Elodea
@Elodea via Kubernetes plugin for JenkinsTresatrescha
Buildkit (now the default with Docker) has "Build reproducibility". See my updated answer below.Sofia
I
3

Given that this doesn't happen for every run, and likely wouldn't happen if you tested locally, the issue doesn't appear to be with your Dockerfile or FROM line. Docker does not automatically clean the cache, so you'll want to investigate what external processes are deleting the cache. Since you are running your builds in Jenkins with a kubernetes plugin, the issue appears to be from that plugin cleaning up build agents after a timeout. From the documentation, you can see various settings to tune this builder:

  • podRetention Controls the behavior of keeping slave pods. Can be 'never()', 'onFailure()', 'always()', or 'default()' - if empty will default to deleting the pod after activeDeadlineSeconds has passed.
  • activeDeadlineSeconds If podRetention is set to 'never()' or 'onFailure()', pod is deleted after this deadline is passed.
  • idleMinutes Allows the Pod to remain active for reuse until the configured number of minutes has passed since the last step was executed on it.

One method to workaround ephemeral build agents is to use the --cache-from option in the docker build command. With the classic build (vs buildkit) you need to first pull this image locally. That image would be from a previous build, and you can use multiple images for your cache, which is particularly useful for multi-stage builds since you'll need to pull a cache for each stage. This flag tells docker to trust the image pulled from a registry since normally only locally built images are trusted (there's a risk someone could inject a malicious image that claims to have run steps in a popular image but includes malware in the tar of that layer).

Inanity answered 9/5, 2020 at 23:2 Comment(3)
Thank you for this highly thorough answer! When I tried to use --cache-from I found out that my image, docker:1.11 is so old that it doesn't support it. Upgrading to docker:1.13, which first introduced this feature, seems to have done it even without the flag. (first I tried with the flag, then without, and it still worked on different branches) Cheers!Tresatrescha
If the image has already been built on the node, there's no need for --cache-from since docker trusts images it built. Once the builder is pruned by Jenkins, you'll need to pull the image and use the flag in the next builder instance to avoid the cache miss. I just include that in the pipeline for every build since a pull of an image you already have is very fast.Inanity
I'd also recommend getting on the latest stable docker image if possible, a lot has changed in over 3 years since 1.13 to improve builds, including multi-stage builds, and buildkit support is also available in newer releases.Inanity
L
1

There are two kind of digests: digest of image manifest in registry and digest of JSON configuration of local image, which also contains digest of image content.

The first digest: python:3.6.10@sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11 Is digest of manifest in Docker Hub as reference.

Digests are not mutable.

If two different things produce same digest value - then hash function (sha256 used in this case) would be broken and could not be used anymore. See collision.

In your case, it did not find cached image anymore for some reason. It downloaded the same image again.

The resulting digest in the end (---> 114ae8bdb954) is digest for resulted configuration for that image (Image ID).

You can confirm that correct manifest was downloaded:

docker inspect 114ae8bdb954 

There is included:

"RepoDigests": [
            "python@sha256:6cd232ed00e729b4d4d3aa57c1764dddfab70f616042b7f36536e2c3d70c4c11"
        ],

As image id is identical in both cases, I think there is nothing to fix. However, if it always happens, there are some problems with caching.

Edit about caching: If this is done in docker-in-docker scenario - it will rebuild this image always again if something changes before building stage in parent Docker.

More information about image id: https://windsock.io/explaining-docker-image-ids/

Lymphoma answered 9/5, 2020 at 19:51 Comment(1)
Thanks you for your answer. You claim this scenario will always rebuild the image, but the accepted solution explains how to avoid that.Tresatrescha
S
0

Note (Apr. 2022): there is an RFC for pinning "sources": moby/buildkit issue 2794, from Akihiro Suda (Software Engineer at NTT Corp).
It would introduce:

Dockerfile.sum is an equivalent of go.sum but s/go/Dockerfile/.

The content is a subset of BuildInfo:

{
   "sources": [
     {
       "type": "docker-image",
       "ref": "docker.io/library/alpine:latest",
       "pin": "sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
     },
     {
       "type": "http",
       "ref": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md",
       "pin": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"
     }
   ]
}

When `Dockerfile.sum` exists in the context, the Dockerfile builder does:

- Pinning the digest of docker-image sources (`FROM ...`)
- Pinning the digest of http sources (`ADD https://...`)
- Recording the consumed entries to the build info structure (`["containerimage.buildinfo"].consumedPin`)

In the future, Dockerfile should also support `ADD git://...` and pinning its commit hash.

It is a way to provide/show the buildinfo feature to the end user, participating in Build reproducibility.


Update Sept. 2023: Akihiro Suda now adds:

We have source-policy-file now: Buildkit / Reproducing the pinned dependencies (since BuildKit v0.11, Jan. 2023):

Example:

{
  "rules": [
    {
      "action": "CONVERT",
      "selector": {
        "identifier": "docker-image://docker.io/library/alpine:latest"
      },
      "updates": {
        "identifier": "docker-image://docker.io/library/alpine:latest@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454"
      }
    },
    {
      "action": "CONVERT",
      "selector": {
        "identifier": "https://raw.githubusercontent.com/moby/buildkit/v0.10.1/README.md"
      },
      "updates": {
        "attrs": {"http.checksum": "sha256:6e4b94fc270e708e1068be28bd3551dc6917a4fc5a61293d51bb36e6b75c4b53"}
      }
    }
  ]
}
Sofia answered 10/4, 2022 at 20:10 Comment(0)
S
0

When running this

docker run -d -p 4445:4444 --platform linux/amd64 selenium/standalone-chrome:4.2.2

I saw this

Status: Downloaded newer image for selenium/standalone-chrome:4.2.2

This made me think docker was ignoring my request for 4.2.2, and getting newer images instead. Which I specifically didn't want it to do.

However, it's just a ambiguous/misleading message from docker. What it means is that Docker found a newer version of the selenium/standalone-chrome:4.2.2 image on Docker Hub than the one I had locally, even though they both have the tag 4.2.2. It does not mean that Docker pulled a version other than 4.2.2.

Stonechat answered 27/8, 2023 at 0:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.