Does Git use SHA-256 to calculate commit hashes?
Asked Answered
C

3

21

Does the current version of git (2.30.0) already use SHA256 to calculate commit hashes by default?

If not, how can SHA-256 be enabled for a new git repository and how can be checked whether a certain git repository uses SHA-256 or SHA-1 for its commit hashes?

Convolve answered 24/1, 2021 at 12:26 Comment(7)
@mkrieger1 unfortunately no. I have seen multiple documents and articles regarding this move from 2017. but it's 2021 now, my question is whether this is now enabled by default and if not, how to use it?Convolve
But why? ......Anemoscope
@ArkadiuszDrabczyk because SHA-1 isn't really that secure anymore - so if you wanna use the commit hash to prove integrity for example, SHA-1 wouldn't be considered secure anymore in many contexts.Convolve
You might note that all constructed hashcode collisions in ever-considered-secure hashes are in .pdf's (and perhaps similar formats, but I think they're all .pdf's). That's because humans don't look at .pdf's directly, they look at a rendering, and you can hide a colossal amount of bullshit in a .pdf. The sort that it takes to steer a good hash code into producing a collision. Anybody trying to produce two snapshots that both looks sensible to a human eye has a much, much more daunting task in front of them.Gynecic
Agreed - but there is sometimes legislation that puts requirements on cryptographic securit. In particular, I would like to proof that commits existed at a cetain time by timestamping them using a (govenmentally trusted) TSA and the corresponding legislation that defines what is trusted specifies that hashes must be at least SHA-256Convolve
That's completely false. Signing commits or tags relies on SHA-1 for security with the Merkle tree.Whittling
@Chris "signing a commit" means signing its hash. If you sing a SHA-1 hash to prove something and certain rules require for hash based proves at least SHA-256, then that won't workConvolve
W
24

Whether to use SHA-1 or SHA-256 is a per-repository setting in recent versions of Git. The plan is eventually to make it possible to store data in a repository in SHA-256 and access the objects with either the SHA-1 name or the SHA-256 name. SHA-1 remains the default.

Do note that the SHA-256 mode is experimental and could theoretically change but there are no plans to do so. The Git developers are making every effort to not break compatibility with existing SHA-256 repositories.

To create a new repository with SHA-256, use the --object-format option to git init. If you want to know which algorithm a local repository uses, run git rev-parse --show-object-format, which will output either sha1 or sha256. To see the hash of a remote repository, you can use git ls-remote and verify the length of the hashes printed.

Do note that no major forges support SHA-256 and therefore such repositories cannot be uploaded to them.

Whittling answered 24/1, 2021 at 18:48 Comment(2)
Thank you! this is very relevant information. Do you know whether there is any ETA on when to make this feature non-experimental on git and when it might be supported by github? This seems to be a feature that's cooking already for quite a whileConvolve
I'm the primary person working on it on the Git side and it happens in my free time, so it's hard to say. Interoperability is coming, but the work is slow. As for GitHub, I can't speak to the product roadmap. If you want to see it (especially if you're a corporate user), let GitHub Support know, since GitHub tracks customer feedback and feature requests.Whittling
A
10

According to the man page for git-init for version 2.30.0, the sha-256 support is still considered experimental:

--object-format=<format

    Specify the given object format (hash algorithm) for the
    repository. The valid values are sha1 and (if enabled) sha256.
    sha1 is the default.

    THIS OPTION IS EXPERIMENTAL! SHA-256 support is experimental and
    still in an early stage. A SHA-256 repository will in general not
    be able to share work with "regular" SHA-1 repositories. It should
    be assumed that, e.g., Git internal file formats in relation to
    SHA-256 repositories may change in backwards-incompatible ways.
    Only use --object-format=sha256 for testing purposes.
Alpestrine answered 24/1, 2021 at 13:41 Comment(1)
worked for me mkdir sha256 && cd sha256 git init --object-format=sha256 dd if=/dev/zero bs=$((1024*1024)) count=$((5*1024)) | git hash-object --stdin --literally result dc18ca621300c8d3cfa505a275641ebab00de189859e022a975056882d313e64, on WSL Ubuntu git version 2.33.1Runin
M
9

According to the man page for git-init for version 2.30.0, the sha-256 support is still considered experimental:

Actually, Git 2.42 (Q3 2023) tones down the warning on SHA-256 repositories being an experimental curiosity.
There is no support (yet) for them to interoperate with traditional SHA-1 repositories, but at this point, there is no plan to make breaking changes to SHA-256 repositories and there is no longer need for such a strongly phrased warning.

See commit 8e42eb0 (31 Jul 2023) by Adam Majer (AdamMajer).
(Merged by Junio C Hamano -- gitster -- in commit e48d9c7, 07 Aug 2023)

doc: sha256 is no longer experimental

Signed-off-by: Adam Majer

Remove scary wording that basically stops people using sha256 repositories not because of interoperability issues with sha1 repositories, but from fear that their work will suddenly become incompatible in some future version of git.

We should be clear that currently sha256 repositories will not work with sha1 repositories but stop the scary words.

git now includes in its man page:

is always used. The default is "sha1". See --object-format in git init.


Git 2.43 (Q4 2023), adds more details about the SHA-256 hash: the "streaming" interface used for bulk-checkin codepath has been narrowed to take only blob objects for now, with no real loss of functionality.

See commit 9eb5419 (26 Sep 2023) by Eric W. Biederman (ebiederm).
(Merged by Junio C Hamano -- gitster -- in commit 3df51ea, 10 Oct 2023)

bulk-checkin: only support blobs in index_bulk_checkin

Inspired-by: brian m. carlson
Signed-off-by: "Eric W. Biederman"

As the code is written today index_bulk_checkin only accepts blobs.
Remove the enum object_type parameter and rename index_bulk_checkin to index_blob_bulk_checkin, index_stream to index_blob_stream, deflate_to_pack to deflate_blob_to_pack, stream_to_pack to stream_blob_to_pack, to make this explicit.

Not supporting commits, tags, or trees has no downside as it is not currently supported now, and commits, tags, and trees being smaller by design do not have the problem that the problem that index_bulk_checkin was built to solve.

Before we start adding code to support the hash function transition supporting additional objects types in index_bulk_checkin has no real additional cost, just an extra function parameter to know what the object type is.
Once we begin the hash function transition this is not the case.

The hash function transition document specifies that a repository with compatObjectFormat enabled will compute and store both the SHA-1 and SHA-256 hash of every object in the repository.

What makes this a challenge is that it is not just an additional hash over the same object.
Instead the hash function transition document specifies that the compatibility hash (specified with compatObjectFormat) be computed over the equivalent object that another git repository whose storage hash (specified with objectFormat) would store.
When comparing equivalent repositories built with different storage hash functions, the oids embedded in objects used to refer to other objects differ and the location of signatures within objects differ.

As blob objects have neither oids referring to other objects nor stored signatures their storage hash and their compatibility hash are computed over the same object.

The other kinds of objects: trees, commits, and tags, all store oids referring to other objects.
Signatures are stored in commit and tag objects.
As oids and the tags to store signatures are not the same size in repositories built with different storage hashes the size of the equivalent objects are also different.

A version of index_bulk_checkin that supports more than just blobs when computing both the SHA-1 and the SHA-256 of every object added would need a different, and more expensive structure.
The structure is more expensive because it would be required to temporarily buffering the equivalent object the compatibility hash needs to be computed over.

A temporary object is needed, because before a hash over an object can computed it's object header needs to be computed.
One of the members of the object header is the entire size of the object.
To know the size of an equivalent object an entire pass over the original object needs to be made, as trees, commits, and tags are composed of a variable number of variable sized pieces.
Unfortunately there is no formula to compute the size of an equivalent object from just the size of the original object.

Avoid all of those future complications by limiting index_bulk_checkin to only work on blobs.


With Git 2.46 (Q3 2024), batch 15, it is confirmed: the future Git 3.0 will use SHA256!

See commit 028bb23, commit fcf0f48, commit 6ccf041, commit 57ec925 (14 Jun 2024) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 166cdd8, 20 Jun 2024)

BreakingChanges: document upcoming change from "sha1" to "sha256"

Signed-off-by: Patrick Steinhardt

Starting with 8e42eb0 ("doc: sha256 is no longer experimental", 2023-07-31, Git v2.42.0-rc1 -- merge), the "sha256" object format is no longer considered to be experimental.
Furthermore, the SHA-1 hash function is actively recommended against by for example NIST and FIPS 140-2, and attacks against it are becoming more practical both due to new weaknesses (SHAppening, SHAttered, Shambles) and due to the ever-increasing computing power.
It is only a matter of time before it can be considered to be broken completely.

Let's plan for this event by being active instead of waiting for it to happend and announce that the default object format is going to change from "sha1" to "sha256" with Git 3.0.

All major Git implementations (libgit2, JGit, go-git) support the "sha256" object format and are thus prepared for this change.
The most important missing piece in the puzzle is support in forges.
But while GitLab recently gained experimental support for the "sha256" object format though, to the best of my knowledge GitHub doesn't support it yet.
Ideally, announcing this upcoming change will encourage forges to start building that support.

BreakingChanges now includes in its man page:

The default hash function for new repositories will be changed from "sha1" to "sha256".
SHA-1 has been deprecated by NIST in 2011 and is nowadays recommended against in FIPS 140-2 and similar certifications.

Furthermore, there are practical attacks on SHA-1 that weaken its cryptographic properties:

  • The SHAppening (2015). The first demonstration of a practical attack against SHA-1 with 2^57 operations.
  • SHAttered (2017). Generation of two valid PDF files with 2^63 operations.
  • Birthday-Near-Collision (2019). This attack allows for chosen prefix attacks with 2^68 operations.
  • Shambles (2020). This attack allows for chosen prefix attacks with 2^63 operations.

While we have protections in place against known attacks, it is expected that more attacks against SHA-1 will be found by future research. Paired with the ever-growing capability of hardware, it is only a matter of time before SHA-1 will be considered broken completely. We want to be prepared and will thus change the default hash algorithm to "sha256" for newly initialized repositories.

An important requirement for this change is that the ecosystem is ready to support the "sha256" object format. This includes popular Git libraries, applications and forges.

There is no plan to deprecate the "sha1" object format at this point in time.

Cf. patches.

object-format-disclaimer now includes in its man page:

Note: At present, there is no interoperability between SHA-256 repositories and SHA-1 repositories.

Historically, we warned that SHA-256 repositories may later need backward incompatible changes when we introduce such interoperability features. Today, we only expect compatible changes.
Furthermore, if such changes prove to be necessary, it can be expected that SHA-256 repositories created with today's Git will be usable by future versions of Git without data loss.

Mady answered 10/8, 2023 at 20:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.