what are the different repository format versions (for the core.repositoryFormatVersion setting) in git?
Asked Answered
F

2

25

I noticed a default option in git core.repositoryFormatVersion which defaults to 0, but what are "repository format versions" and what functional difference do they make?

Ferdie answered 3/3, 2011 at 0:26 Comment(1)
Four and half years later, Git 2.7 (Nov. 2015) finally documents core.repositoryFormatVersion, and it is... quite interesting. See my answer belowDauphine
D
20

git 2.7 (Nov. 2015) adds a lot more information in the new Documentation/technical/repository-version.txt.
See commit 067fbd4, commit 00a09d5 (23 Jun 2015) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit fa46579, 26 Oct 2015)

You now can define "extensions", and use core.repositoryformatversion as a "marker" to signal the existence of said extensions, instead of having to bump the Git version number itself:

If we were to bump the repository version for every such change, then any implementation understanding version X would also have to understand X-1, X-2, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature).

This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions.

Extracts from the doc:

Every git repository is marked with a numeric version in the core.repositoryformatversion key of its config file. This version specifies the rules for operating on the on-disk repository data.

Note that this applies only to accessing the repository's disk contents directly.
An older client which understands only format 0 may still connect via git:// to a repository using format 1, as long as the server process understands format 1.

Version 0

This is the format defined by the initial version of git, including but not limited to the format of the repository directory, the repository configuration file, and the object and ref storage.

Version 1

This format is identical to version 0, with the following exceptions:

  1. When reading the core.repositoryformatversion variable, a git implementation which supports version 1 MUST also read any configuration keys found in the extensions section of the configuration file.

  2. If a version-1 repository specifies any extensions.* keys that the running git has not implemented, the operation MUST NOT proceed.
    Similarly, if the value of any known key is not understood by the implementation, the operation MUST NOT proceed.

This can be used, for example:

  • to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children)

  • that the refs are stored in a format besides the usual "refs" and "packed-refs" directories

Now that is really an original approach to all the release version number policy and its semver policy.

Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats.

For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db".
Older versions of git will not understand format "1" and bail.
Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run.
This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read.

Note that we are only defining the rules for format 1 here.
We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations.


As a first extension, you will have with git 2.7 preciousObjects:

If this extension is used in a repository, then no operations should run which may drop objects from the object storage. This can be useful if you are sharing that storage with other repositories whose refs you cannot see.

The doc mentions:

When the config key extensions.preciousObjects is set to true, objects in the repository MUST NOT be deleted (e.g., by git-prune or git repack -d).

That is:

For instance, if you do:

$ git clone -s parent child
$ git -C parent config extensions.preciousObjects true
$ git -C parent config core.repositoryformatversion 1

you now have additional safety when running git in the parent repository.
Prunes and repacks will bail with an error, and git gc will skip those operations (it will continue to pack refs and do other non-object operations).
Older versions of Git, when run in the repository, will fail on every operation.

Note that we do not set the preciousObjects extension by default when doing a "clone -s", as doing so breaks backwards compatibility. It is a decision the user should make explicitly.


Note that this core.repositoryformatversion business is old. Really old. commit ab9cb76, Nov. 2005, Git 0.99.9l.
It was done initially for the db version:

This makes init-db repository version aware.

It checks if an existing config file says the repository being reinitialized is of a wrong version and aborts before doing further harm.


Git 2.22 (Q2 2019) will avoid leaks around the repository_format structure.

See commit e8805af (28 Feb 2019), and commit 1301997 (22 Jan 2019) by Martin Ågren (``).
(Merged by Junio C Hamano -- gitster -- in commit 6b5688b, 20 Mar 2019)

setup: fix memory leaks with struct repository_format

After we set up a struct repository_format, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space.
In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory.

Introduce an initialization macro REPOSITORY_FORMAT_INIT and a function clear_repository_format(), to be used on each side of read_repository_format(). To have a clear and simple memory ownership, let all users of struct repository_format duplicate the strings that they take from it, rather than stealing the pointers.

Call clear_...() at the start of read_...() instead of just zeroing the struct, since we sometimes enter the function multiple times.
Thus, it is important to initialize the struct before calling read_...(), so document that.
It's also important because we might not even call read_...() before we call clear_...(), see, e.g., builtin/init-db.c.

Teach read_...() to clear the struct on error, so that it is reset to a safe state, and document this. (In setup_git_directory_gently(), we look at repo_fmt.hash_algo even if repo_fmt.version is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.)


With Git 2.28 (Q3 2020), the runtime itself can upgrade the repository format version automatically, for example on an unshallow fetch.

See commit 14c7fa2, commit 98564d8, commit 01bbbbd, commit 16af5f1 (05 Jun 2020) by Xin Li (livid).
(Merged by Junio C Hamano -- gitster -- in commit 1033b98, 29 Jun 2020)

fetch: allow adding a filter after initial clone

Signed-off-by: Xin Li

Retroactively adding a filter can be useful for existing shallow clones as they allow users to see earlier change histories without downloading all git objects in a regular --unshallow fetch.

Without this patch, users can make a clone partial by editing the repository configuration to convert the remote into a promisor, like:

git config core.repositoryFormatVersion 1
git config extensions.partialClone origin   
git fetch --unshallow --filter=blob:none origin

Since the hard part of making this work is already in place and such edits can be error-prone, teach Git to perform the required configuration change automatically instead.

Note that this change does not modify the existing Git behavior which recognizes setting extensions.partialClone without changing repositoryFormatVersion.


Warning: In 2.28-rc0, we corrected a bug that some repository extensions are honored by mistake even in a version 0 repositories (these configuration variables in extensions.* namespace were supposed to have special meaning in repositories whose version numbers are 1 or higher), but this was a bit too big a change.

See commit 62f2eca, commit 1166419 (15 Jul 2020) by Jonathan Nieder (artagnon).
(Merged by Junio C Hamano -- gitster -- in commit d13b7f2, 16 Jul 2020)

Revert "check_repository_format_gently(): refuse extensions for old repositories"

Reported-by: Johannes Schindelin
Signed-off-by: Jonathan Nieder

This reverts commit 14c7fa269e42df4133edd9ae7763b678ed6594cd.

The core.repositoryFormatVersion field was introduced in ab9cb76f661 ("Repository format version check.", 2005-11-25, Git v0.99.9l -- merge), providing a welcome bit of forward compatibility, thanks to some welcome analysis by Martin Atukunda.

The semantics are simple: a repository with core.repositoryFormatVersion set to 0 should be comprehensible by all Git implementations in active use; and Git implementations should error out early instead of trying to act on Git repositories with higher core.repositoryFormatVersion values representing new formats that they do not understand.

A new repository format did not need to be defined until 00a09d57eb8 (introduce "extensions" form of core.repositoryformatversion, 2015-06-23).

This provided a finer-grained extension mechanism for Git repositories.

In a repository with core.repositoryFormatVersion set to 1, Git implementations can act on "extensions.*" settings that modify how a repository is interpreted.

In repository format version 1, unrecognized extensions settings cause Git to error out.

What happens if a user sets an extension setting but forgets to increase the repository format version to 1?
The extension settings were still recognized in that case; worse, unrecognized extensions settings do not cause Git to error out.

So combining repository format version 0 with extensions settings produces in some sense the worst of both worlds.

To improve that situation, since 14c7fa269e4 (check_repository_format_gently(): refuse extensions for old repositories, 2020-06-05) Git instead ignores extensions in v0 mode. This way, v0 repositories get the historical (pre-2015) behavior and maintain compatibility with Git implementations that do not know about the v1 format.

Unfortunately, users had been using this sort of configuration and this behavior change came to many as a surprise:

  • users of "git config --worktree" that had followed its advice to enable extensions.worktreeConfig (without also increasing the repository format version) would find their worktree configuration no longer taking effect
  • tools such as copybara that had set extensions.partialClone in existing repositories (without also increasing the repository format version) would find that setting no longer taking effect

The behavior introduced in 14c7fa269e4 might be a good behavior if we were traveling back in time to 2015, but we're far too late.

For some reason I thought that it was what had been originally implemented and that it had regressed.

Apologies for not doing my research when 14c7fa269e4 was under development.

Let's return to the behavior we've had since 2015: always act on extensions.* settings, regardless of repository format version.

While we're here, include some tests to describe the effect on the "upgrade repository version" code path.

Dauphine answered 1/11, 2015 at 15:53 Comment(0)
S
31

It's for future compatibility -- if the git developers ever find it necessary to change the way that repos are stored on disk to enable some new feature, then they can make upgraded repos have a core.repositoryformatversion of 1. Then newer versions of git that know about that new format will trigger the code to deal with it, and older versions of git that don't will gracefully error with "Expected git repo version <= 0, found 1. Please upgrade Git".

As of now, the only repo format version defined or recognized is 0, which denotes the format that every public release of git has used.

Samellasameness answered 3/3, 2011 at 0:34 Comment(1)
Note that Git 2.7 (Nov. 2015, four and half years later) finally documents core.repositoryFormatVersion. See my answer belowDauphine
D
20

git 2.7 (Nov. 2015) adds a lot more information in the new Documentation/technical/repository-version.txt.
See commit 067fbd4, commit 00a09d5 (23 Jun 2015) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit fa46579, 26 Oct 2015)

You now can define "extensions", and use core.repositoryformatversion as a "marker" to signal the existence of said extensions, instead of having to bump the Git version number itself:

If we were to bump the repository version for every such change, then any implementation understanding version X would also have to understand X-1, X-2, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature).

This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions.

Extracts from the doc:

Every git repository is marked with a numeric version in the core.repositoryformatversion key of its config file. This version specifies the rules for operating on the on-disk repository data.

Note that this applies only to accessing the repository's disk contents directly.
An older client which understands only format 0 may still connect via git:// to a repository using format 1, as long as the server process understands format 1.

Version 0

This is the format defined by the initial version of git, including but not limited to the format of the repository directory, the repository configuration file, and the object and ref storage.

Version 1

This format is identical to version 0, with the following exceptions:

  1. When reading the core.repositoryformatversion variable, a git implementation which supports version 1 MUST also read any configuration keys found in the extensions section of the configuration file.

  2. If a version-1 repository specifies any extensions.* keys that the running git has not implemented, the operation MUST NOT proceed.
    Similarly, if the value of any known key is not understood by the implementation, the operation MUST NOT proceed.

This can be used, for example:

  • to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children)

  • that the refs are stored in a format besides the usual "refs" and "packed-refs" directories

Now that is really an original approach to all the release version number policy and its semver policy.

Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats.

For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db".
Older versions of git will not understand format "1" and bail.
Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run.
This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read.

Note that we are only defining the rules for format 1 here.
We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations.


As a first extension, you will have with git 2.7 preciousObjects:

If this extension is used in a repository, then no operations should run which may drop objects from the object storage. This can be useful if you are sharing that storage with other repositories whose refs you cannot see.

The doc mentions:

When the config key extensions.preciousObjects is set to true, objects in the repository MUST NOT be deleted (e.g., by git-prune or git repack -d).

That is:

For instance, if you do:

$ git clone -s parent child
$ git -C parent config extensions.preciousObjects true
$ git -C parent config core.repositoryformatversion 1

you now have additional safety when running git in the parent repository.
Prunes and repacks will bail with an error, and git gc will skip those operations (it will continue to pack refs and do other non-object operations).
Older versions of Git, when run in the repository, will fail on every operation.

Note that we do not set the preciousObjects extension by default when doing a "clone -s", as doing so breaks backwards compatibility. It is a decision the user should make explicitly.


Note that this core.repositoryformatversion business is old. Really old. commit ab9cb76, Nov. 2005, Git 0.99.9l.
It was done initially for the db version:

This makes init-db repository version aware.

It checks if an existing config file says the repository being reinitialized is of a wrong version and aborts before doing further harm.


Git 2.22 (Q2 2019) will avoid leaks around the repository_format structure.

See commit e8805af (28 Feb 2019), and commit 1301997 (22 Jan 2019) by Martin Ågren (``).
(Merged by Junio C Hamano -- gitster -- in commit 6b5688b, 20 Mar 2019)

setup: fix memory leaks with struct repository_format

After we set up a struct repository_format, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space.
In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory.

Introduce an initialization macro REPOSITORY_FORMAT_INIT and a function clear_repository_format(), to be used on each side of read_repository_format(). To have a clear and simple memory ownership, let all users of struct repository_format duplicate the strings that they take from it, rather than stealing the pointers.

Call clear_...() at the start of read_...() instead of just zeroing the struct, since we sometimes enter the function multiple times.
Thus, it is important to initialize the struct before calling read_...(), so document that.
It's also important because we might not even call read_...() before we call clear_...(), see, e.g., builtin/init-db.c.

Teach read_...() to clear the struct on error, so that it is reset to a safe state, and document this. (In setup_git_directory_gently(), we look at repo_fmt.hash_algo even if repo_fmt.version is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.)


With Git 2.28 (Q3 2020), the runtime itself can upgrade the repository format version automatically, for example on an unshallow fetch.

See commit 14c7fa2, commit 98564d8, commit 01bbbbd, commit 16af5f1 (05 Jun 2020) by Xin Li (livid).
(Merged by Junio C Hamano -- gitster -- in commit 1033b98, 29 Jun 2020)

fetch: allow adding a filter after initial clone

Signed-off-by: Xin Li

Retroactively adding a filter can be useful for existing shallow clones as they allow users to see earlier change histories without downloading all git objects in a regular --unshallow fetch.

Without this patch, users can make a clone partial by editing the repository configuration to convert the remote into a promisor, like:

git config core.repositoryFormatVersion 1
git config extensions.partialClone origin   
git fetch --unshallow --filter=blob:none origin

Since the hard part of making this work is already in place and such edits can be error-prone, teach Git to perform the required configuration change automatically instead.

Note that this change does not modify the existing Git behavior which recognizes setting extensions.partialClone without changing repositoryFormatVersion.


Warning: In 2.28-rc0, we corrected a bug that some repository extensions are honored by mistake even in a version 0 repositories (these configuration variables in extensions.* namespace were supposed to have special meaning in repositories whose version numbers are 1 or higher), but this was a bit too big a change.

See commit 62f2eca, commit 1166419 (15 Jul 2020) by Jonathan Nieder (artagnon).
(Merged by Junio C Hamano -- gitster -- in commit d13b7f2, 16 Jul 2020)

Revert "check_repository_format_gently(): refuse extensions for old repositories"

Reported-by: Johannes Schindelin
Signed-off-by: Jonathan Nieder

This reverts commit 14c7fa269e42df4133edd9ae7763b678ed6594cd.

The core.repositoryFormatVersion field was introduced in ab9cb76f661 ("Repository format version check.", 2005-11-25, Git v0.99.9l -- merge), providing a welcome bit of forward compatibility, thanks to some welcome analysis by Martin Atukunda.

The semantics are simple: a repository with core.repositoryFormatVersion set to 0 should be comprehensible by all Git implementations in active use; and Git implementations should error out early instead of trying to act on Git repositories with higher core.repositoryFormatVersion values representing new formats that they do not understand.

A new repository format did not need to be defined until 00a09d57eb8 (introduce "extensions" form of core.repositoryformatversion, 2015-06-23).

This provided a finer-grained extension mechanism for Git repositories.

In a repository with core.repositoryFormatVersion set to 1, Git implementations can act on "extensions.*" settings that modify how a repository is interpreted.

In repository format version 1, unrecognized extensions settings cause Git to error out.

What happens if a user sets an extension setting but forgets to increase the repository format version to 1?
The extension settings were still recognized in that case; worse, unrecognized extensions settings do not cause Git to error out.

So combining repository format version 0 with extensions settings produces in some sense the worst of both worlds.

To improve that situation, since 14c7fa269e4 (check_repository_format_gently(): refuse extensions for old repositories, 2020-06-05) Git instead ignores extensions in v0 mode. This way, v0 repositories get the historical (pre-2015) behavior and maintain compatibility with Git implementations that do not know about the v1 format.

Unfortunately, users had been using this sort of configuration and this behavior change came to many as a surprise:

  • users of "git config --worktree" that had followed its advice to enable extensions.worktreeConfig (without also increasing the repository format version) would find their worktree configuration no longer taking effect
  • tools such as copybara that had set extensions.partialClone in existing repositories (without also increasing the repository format version) would find that setting no longer taking effect

The behavior introduced in 14c7fa269e4 might be a good behavior if we were traveling back in time to 2015, but we're far too late.

For some reason I thought that it was what had been originally implemented and that it had regressed.

Apologies for not doing my research when 14c7fa269e4 was under development.

Let's return to the behavior we've had since 2015: always act on extensions.* settings, regardless of repository format version.

While we're here, include some tests to describe the effect on the "upgrade repository version" code path.

Dauphine answered 1/11, 2015 at 15:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.