We have a Docker Compose project including various services, some of which share common base images. After building all images, one of our build job's post-build-steps is to docker image save <images> | xz -zc - >images.tar.xz
to create a single compressed archive of all the images – to be used in an offline-deployment fallback-strategy (so we can transport these images via USB- or CD-media rather than a Docker registry).
The uncompressed docker image save <images>
tar-stream is about 2 GB in size. After piping it through xz
, the compressed images.tar.xz
is only about 500 MB.
This build job is run very often, and most of the time only a few images will have changed. However, the aforementioned docker … | xz …
pipeline will always recreate the images.tar.xz
in its entirety, which requires the most time in the overall build job. I'd like to optimize that.
Is there a way to speed up incremental builds?
I thought about docker image save <imageN> | xz -zc - > imageN.tar.xz
each image individually, so I can save only modified images, but this will result in about twice as much required storage, because docker image save
will include duplicate base images between individual calls.
I would very much like to be able to use a single docker image save <images>
invocation, but only update or re-compress the actual changes in a previous images.tar.xz
. I know that, because of how tar.xz
is structured, small changes – especially at the beginning of the stream – will require to recreate the whole file non-the-less. However, I'd gladly see another solution that involves splitting the tar stream reasonably, such that individual parts can be updated.
Note: Aside from some meta/manifest files at the end, the tar-stream contains a bunch of layer folders, each of which contains a layer.tar
and some meta files, corresponding to the (de-duplicated) layers of all the saved images, e.g.:
0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/
0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/VERSION
0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/json
0166389787802d9a6c19a832fcfe976c30144d2430e798785110d8e8e562dab6/layer.tar
...(~100x4)...
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/VERSION
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/json
fa498ee40da8c70be99b8f451813d386b45da891353d7184cdb8dd1b40efca03/layer.tar
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/VERSION
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/json
ffb2e673ba3e63b6b5922a482783b072759f0b83335a5ffab0b36dc804a24b93/layer.tar
manifest.json
repositories
PS: I'm already using pxz
instead of xz
to utilize all CPU cores during compression, but it still takes a considerable amount of time.
docker image save … | xz -zc -
invocations. If that's hard to optimize, I will see to decrease the number of times these invocations take place. – Schizoid