Bundle git repository without cloning it
Asked Answered
D

1

7

How can I bundle a git project without cloning it every time? Right now I do always commands below.

git clone --mirror http://git_project
cd git_project
git bundle create '../git_project.lock' --all
cd ..
rm git_project -Force -Recurse

I want to do this in one command, something like:

git bundle create '../git_project.lock' --all --repository http://git_project
Dispassionate answered 7/2, 2019 at 11:12 Comment(6)
No way — git bundle works only with local repository. Why do clone the project every time? Clone it once and update later with git fetch/pull. Improve by using bare repo.Jocelin
@phd: I'll to got these bundle files as a backup on an other machine. like on this answer: #5578770Dispassionate
Then run git bundle on that other machine.Jocelin
@phd: It's in the cloud... We don't have access to that machine...Dispassionate
Then local clone is your only option. But you don't need to clone every time. Clone once, preserve the clone, and then update it with git fetch/pull. Create backup bundles from the clone as usual.Jocelin
@phd: But when you don't remove your repository, git bundle create has no sense at all. I'll think about it what's the best thing. Thanks for the info @Jocelin :)Dispassionate
T
2

I'll to got these bundle files as a backup on an other machine

Since my 2011 answer (eleven years ago), you now have remote pipelines like GitHub Actions or GitLab CI.

An automated pipeline on those remote Git repository hosting services can create a bundle for you and save it to a server/backup.

That is today.


Tomorrow, you will be able to a "git bundle"-dedicated server, accessible through a bundle URI.

With Git 2.38 (Q3 2022), the "bundle URI" design gets documented.

See commit d06ed85, commit 2da14fa (09 Aug 2022) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 0d133a3, 18 Aug 2022)

docs: document bundle URI standard

Signed-off-by: Derrick Stolee

Introduce the idea of bundle URIs to the Git codebase through an aspirational design document.
This document includes the full design intended to include the feature in its fully-implemented form.
This will take several steps as detailed in the Implementation Plan section.

By committing this document now, it can be used to motivate changes necessary to reach these final goals.
The design can still be altered as new information is discovered.

technical/bundle-uri now includes in its man page:

Bundle URIs

Git bundles are files that store a pack-file along with some extra metadata, including a set of refs and a (possibly empty) set of necessary commits. See git bundle and link:bundle-format.txt[the bundle format] for more information.

Bundle URIs are locations where Git can download one or more bundles in order to bootstrap the object database in advance of fetching the remaining objects from a remote.

One goal is to speed up clones and fetches for users with poor network connectivity to the origin server. Another benefit is to allow heavy users, such as CI build farms, to use local resources for the majority of Git data and thereby reducing the load on the origin server.

To enable the bundle URI feature, users can specify a bundle URI using command-line options or the origin server can advertise one or more URIs via a protocol v2 capability.

See Also

And:

bundle-uri: add example bundle organization

Signed-off-by: Derrick Stolee

Add a section that details how a bundle provider could work, including using the Git server advertisement for multiple geo-distributed servers.
This organization is based on the GVFS Cache Servers which have successfully used similar ideas to provide fast object access and reduced server load for very large repositories.

technical/bundle-uri now includes in its man page:

Example Bundle Provider organization

This example organization is a simplified model of what is used by the GVFS Cache Servers (see section near the end of this document) which have been beneficial in speeding up clones and fetches for very large repositories, although using extra software outside of Git.

The bundle provider deploys servers across multiple geographies.
Each server manages its own bundle set.

The server can track a number of Git repositories, but provides a bundle list for each based on a pattern.

For example, when mirroring a repository at https://<domain>/<org>/<repo> the bundle server could have its bundle list available at https://<server-url>/<domain>/<org>/<repo>.
The origin Git server can list all of these servers under the "any" mode:

[bundle]
version = 1
mode = any

[bundle "eastus"]
uri = https://eastus.example.com/<domain>/<org>/<repo>

[bundle "europe"]
uri = https://europe.example.com/<domain>/<org>/<repo>

[bundle "apac"]
uri = https://apac.example.com/<domain>/<org>/<repo>

This "list of lists" is static and only changes if a bundle server is added or removed.

The bundle server runs regularly-scheduled updates for the bundle list, such as once a day.
During this task, the server fetches the latest contents from the origin server and generates a bundle containing the objects reachable from the latest origin refs, but not contained in a previously-computed bundle.
This bundle is added to the list, with care that the creationToken is strictly greater than the previous maximum creationToken.

An example bundle list is provided here, although it only has two daily bundles and not a full list of 30:

[bundle]
version = 1
mode = all
heuristic = creationToken

[bundle "2022-02-13-1644770820-daily"]
uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644770820-daily.bundle
creationToken = 1644770820

[bundle "2022-02-09-1644442601-daily"]
uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644442601-daily.bundle
creationToken = 1644442601

[bundle "2022-02-02-1643842562"]
uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-02-1643842562.bundle
creationToken = 1643842562

The intention of this data organization has two main goals.

  • First, initial clones of the repository become faster by downloading precomputed object data from a closer source.

  • Second, git fetch commands can be faster, especially if the client has not fetched for a few days. However, if a client does not fetch for 30 days, then the bundle list organization would cause redownloading a large amount of object data.


This is implemented (still with Git 2.38 (Q3 2022)): "git clone --bundle-uri"(man).

See commit 65da938 (23 Aug 2022), and commit e21e663, commit 59c1752, commit 5556891, commit 53a5089, commit b5624a4 (09 Aug 2022) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 68ef042, 01 Sep 2022)

clone: add --bundle-uri option

Reviewed-by: Josh Steadmon
Signed-off-by: Derrick Stolee

Cloning a remote repository is one of the most expensive operations in Git.
The server can spend a lot of CPU time generating a pack-file for the client's request.
The amount of data can clog the network for a long time, and the Git protocol is not resumable.
For users with poor network connections or are located far away from the origin server, this can be especially painful.

Add a new '--bundle-uri' option to 'git clone'(man) to bootstrap a clone from a bundle.
If the user is aware of a bundle server, then they can tell Git to bootstrap the new repository with these bundles before fetching the remaining objects from the origin server.

git clone now includes in its man page:

--bundle-uri=<uri>

Before fetching from the remote, fetch a bundle from the given <uri> and unbundle the data into the local repository.

The refs in the bundle will be stored under the hidden refs/bundle/* namespace.

This option is incompatible with --depth, --shallow-since, and --shallow-exclude


Git 2.39 (Q4 2022) defines the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them.

See commit 8628a84, commit 70334fc, commit 89bd7fe, commit c23f592, commit c96060b, commit 20c1e2a, commit 738e524, commit bff03c4, commit 0634f71, commit 23b6d00 (12 Oct 2022) by Derrick Stolee (derrickstolee).
See commit d796ced, commit 9424e37 (12 Oct 2022) by Ævar Arnfjörð Bjarmason (avar).
See commit f677f62 (24 Aug 2022) by Junio C Hamano (gitster).
(Merged by Taylor Blau -- ttaylorr -- in commit d32dd8a, 30 Oct 2022)

bundle-uri: fetch a list of bundles

Signed-off-by: Derrick Stolee

When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content.
Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list.
Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required.
The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now.
This can be converted to a configured value in the future if necessary.
The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists).

To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle.
By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves.
They each only need the base bundle.
The bundle containing the merge commit requires both of the middle bundles, though.
This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied.


Git 2.40 (Q1 2023) continues bundle URIs implementation (part 4).

See commit 876094a, commit 12b0a14, commit ebc3947, commit 9ea5796, commit 738dc7d, commit 1b759e0 (22 Dec 2022) by Derrick Stolee (derrickstolee).
See commit 70b9c10, commit 7cce907, commit 0cfde74, commit 8f788eb, commit 8b8d9a2 (22 Dec 2022) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 0903d8b, 02 Jan 2023)

bundle-uri client: add boolean transfer.bundleURI setting

Co-authored-by: Derrick Stolee
Signed-off-by: Ævar Arnfjörð Bjarmason
Signed-off-by: Derrick Stolee

The yet-to-be introduced client support for bundle-uri will always fall back on a full clone, but we'd still like to be able to ignore a server's bundle-uri advertisement entirely.

The new transfer.bundleURI config option defaults to 'false', but a user can set it to 'true' to enable checking for bundle URIs from the origin Git server using protocol v2.

git config now includes in its man page:

transfer.bundleURI

When true, local git clone commands will request bundle information from the remote server (if advertised) and download bundles before continuing the clone through the Git protocol.
Defaults to false.

And:

bundle-uri: serve bundle.* keys from config

Signed-off-by: Derrick Stolee

Implement the "bundle-uri" protocol v2 capability by populating the key=value packet lines from the local Git config.
The list of bundles is provided from the keys beginning with "bundle.".

And:

bundle-uri: allow relative URLs in bundle lists

Signed-off-by: Derrick Stolee

Bundle providers may want to distribute that data across multiple CDNs.
This might require a change in the base URI, all the way to the domain name.
If all bundles require an absolute URI in their 'uri' value, then every push to a CDN would require altering the table of contents to match the expected domain and exact location within it.

Allow a bundle list to specify a relative URI for the bundles.

This URI is based on where the client received the bundle list.
For a list provided in the 'bundle-uri' protocol v2 command, the Git remote URI is the base URI.
Otherwise, the bundle list was provided from an HTTP URI not using the Git protocol, and that URI is the base URI.
This allows easier distribution of bundle data.


With Git 2.40 (Q1 2023), the bundle-URI subsystem adds support for creation-token heuristics to help incremental fetches.

See commit 026df9e, commit c429bed, commit 7f0cc04, commit 0524ad3, commit 4074d3c, commit 7903efb, commit 512fccf, commit c93c3d2, commit 7bc73e7, commit d9fd674, commit e72171f (31 Jan 2023) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 4f59836, 15 Feb 2023)

clone: set fetch.bundleURI if appropriate

Signed-off-by: Derrick Stolee

Bundle providers may organize their bundle lists in a way that is intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in mind, or else the client will not expect to save time by downloading bundles after the initial clone.
This is done by specifying a bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that are not advertised over protocol v2.
If the user specifies a static URI via 'git clone --bundle-uri'(man), then Git can set this config as a reminder for future 'git fetch'(man) operations to check the bundle list before connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different approach and create a property of the remote itself by creating a remote.<id>.* type config key.
That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

git config now includes in its man page:

fetch.bundleURI

This value stores a URI for downloading Git object data from a bundle URI before performing an incremental fetch from the origin Git server.

This is similar to how the --bundle-uri option behaves in git clone.
git clone --bundle-uri will set the fetch.bundleURI value if the supplied bundle URI contains a bundle list that is organized for incremental fetches.


"git fetch --all"(man) does not have to download and handle the same bundleURI over and over, which has been corrected with Git 2.41 (Q2 2023).

See commit 25bccb4 (31 Mar 2023) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 89833fc, 06 Apr 2023)

fetch: download bundles once, even with --all

Signed-off-by: Derrick Stolee

When fetch.bundleURI is set, 'git fetch'(man) downloads bundles from the given bundle URI before fetching from the specified remote.
However, when using non-file remotes, 'git fetch --all' will launch 'git fetch' subprocesses which then read fetch.bundleURI and fetch the bundle list again.
We do not expect the bundle list to have new information during these multiple runs, so avoid these extra calls by un-setting fetch.bundleURI in the subprocess arguments.

Be careful to skip fetching bundles for the empty bundle string.
Fetching bundles from the empty list presents some interesting test failures.


With Git 2.46 (Q3 2024), batch 19, when bundleURI interface fetches multiple bundles, Git failed to take full advantage of all bundles and ended up slurping duplicated objects.

See commit 63d903f, commit d0cbc75, commit 3079026 (19 Jun 2024) by Xing Xin (xing).
(Merged by Junio C Hamano -- gitster -- in commit 125e389, 08 Jul 2024)

bundle-uri: verify oid before writing refs

Reviewed-by: Karthik Nayak
Reviewed-by: Patrick Steinhardt
Signed-off-by: Xing Xin

When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones.
This was particularly problematic when employing the "creationTokens" heuristic.

To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A".
Firstly, create a base bundle with:

git bundle create base.bundle main

Then, add a new commit "B" on top of "A", and create an incremental bundle for "main":

git bundle create incr.bundle A..main

Now, generate a bundle list with the following content:

[bundle]
    version = 1
    mode = all
    heuristic = creationToken

[bundle "base"]
    uri = base.bundle
    creationToken = 1

[bundle "incr"]
    uri = incr.bundle
    creationToken = 2

A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository.
However, Git would still download everything from the server, as if it had fetched nothing locally.

Tomkins answered 10/9, 2022 at 11:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.