Incremental `docker image save <images> | xz -zc - > images.tar.xz`
Asked Answered
S

0

3

We have a Docker Compose project including various services, some of which share common base images. After building all images, one of our build job's post-build-steps is to docker image save <images> | xz -zc - >images.tar.xz to create a single compressed archive of all the images – to be used in an offline-deployment fallback-strategy (so we can transport these images via USB- or CD-media rather than a Docker registry).

The uncompressed docker image save <images> tar-stream is about 2 GB in size. After piping it through xz, the compressed images.tar.xz is only about 500 MB.

This build job is run very often, and most of the time only a few images will have changed. However, the aforementioned docker … | xz … pipeline will always recreate the images.tar.xz in its entirety, which requires the most time in the overall build job. I'd like to optimize that.

Is there a way to speed up incremental builds?

I thought about docker image save <imageN> | xz -zc - > imageN.tar.xz each image individually, so I can save only modified images, but this will result in about twice as much required storage, because docker image save will include duplicate base images between individual calls.

I would very much like to be able to use a single docker image save <images> invocation, but only update or re-compress the actual changes in a previous images.tar.xz. I know that, because of how tar.xz is structured, small changes – especially at the beginning of the stream – will require to recreate the whole file non-the-less. However, I'd gladly see another solution that involves splitting the tar stream reasonably, such that individual parts can be updated.

Note: Aside from some meta/manifest files at the end, the tar-stream contains a bunch of layer folders, each of which contains a layer.tar and some meta files, corresponding to the (de-duplicated) layers of all the saved images, e.g.:

0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/
0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/VERSION
0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/json
0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/layer.tar
...(~100x4)...
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/VERSION
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/json
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/layer.tar
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/VERSION
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/json
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/layer.tar
manifest.json
repositories

PS: I'm already using pxz instead of xz to utilize all CPU cores during compression, but it still takes a considerable amount of time.

Schizoid answered 7/4, 2017 at 9:40 Comment(5)
Why don't u use a docker registry (like the free docker registry or protus). The registry is a solution for storing images efficiently and if it is located on the same network data transfers are quick.Jett
We do use a registry to deploy our images, so images can easily be updated from the cloud (once the docker-compose project gets updated via some debian package). However, some of our customers don't allow cloud communication, and thus can't update online, so we have an additional fairly "automated" offline deployment strategy that involves a USB stick and some minimal manual labor. However, this needs to be tested as well, so it is part of our continuous integration pipeline.Schizoid
Maybe we should split this offline stuff into a separate build, that only runs once a day and on final releases, or something.Schizoid
This offline stuff could pull from your registry (from within your own network - no cloud) and then deliver it to customer with usb stick. One issue is how you deliver your images. The other issue is how you store images. You can still use registry for storage and change the delivery because registry does storage really well.Jett
My question is about neither the online delivery nor the storage of docker images. Yes, we do have a registry for that job. My question is about optimizing incremental docker image save … | xz -zc - invocations. If that's hard to optimize, I will see to decrease the number of times these invocations take place.Schizoid

© 2022 - 2024 — McMap. All rights reserved.