Is git's semi-secret empty tree object reliable, and why is there not a symbolic name for it?
Asked Answered
H

3

153

Git has a well-known, or at least sort-of-well-known, empty tree whose SHA1 is:

4b825dc642cb6eb9a060e54bf8d69288fbee4904

(you can see this in any repo, even a newly created one, with git cat-file -t and git cat-file -p). [Edit made in 2020: the SHA-256 empty tree hash ID is:

6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321

as VonC mentions in his answer. My question was apparently about 8 years early. πŸ˜€]

If you work hard and are very careful you can sort of use this empty tree to store a directory that has no files (see answer to How do I add an empty directory to a git repository), although it's not really a great idea.

It's more useful as one argument to git diff-tree, which one of the sample hooks does.

What I'm wondering is,

  1. how reliable is thisβ€”i.e., will some future version of git not have a git object numbered 4b825dc642cb6eb9a060e54bf8d69288fbee4904?
  2. Why is there no symbolic name for the empty tree (or is there one?).

(A quick and dirty way to create a symbolic name is to put the SHA1 in, e.g., .git/Nulltree. Unfortunately you have to do this for every repo. Seems better to just put the magic number in scripts, etc. I just have a general aversion to magic numbers.)

Hessenassau answered 19/3, 2012 at 5:32 Comment(7)
just to remember the hash ;-) use SHA1("tree 0\0") = 4b825dc642cb6eb9a060e54bf8d69288fbee4904 (\0 is NUL character) – Overwhelm
@Thomas: the git hash-object -t tree /dev/null method (from VonC's answer below) has the advantage of not hard-coding SHA-1, in case some future version of git switches to SHA-2 for instance. (I'm not going to attempt to predict when that might happen. :-) It would be easier to switch Mercurial to SHA-2, since they left room for it.) – Hessenassau
of cause you are right but it is a good piece of "Useless Knowledge" and may it is helpful in any case to anyone else?! – Overwhelm
@Thomas: looks like the hash algorithm changeover might happen sooner than expected. :-) – Hessenassau
Speaking of "some future version of Git", I think you will be interested in my latest (Dec. 2017) edit to my 2012 answer: stackoverflow.com/revisions/9766506/7 – Myasthenia
Beware: 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321 will soon be the new 4b825dc642cb6eb9a060e54bf8d69288fbee4904 empty tree hash. See my edited answer below. – Myasthenia
For some reason, prefixed of this commit don't work, e.g. git show 4b825d doesn't work, but git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904 does. – Volturno
M
132

This thread mentions:

If you don't remember the empty tree sha1, you can always derive it with:

git hash-object -t tree /dev/null

Or, as Ciro Santilli proposes in the comments:

printf '' | git hash-object --stdin -t tree

Or, as seen here, from Colin Schimmelfing:

git hash-object -t tree --stdin < /dev/null

So I guess it is safer to define a variable with the result of that command as your empty sha1 tree (instead of relying of a "well known value").

Note: Git 2.25.1 (Feb. 2020) proposes in commit 9c8a294:

empty_tree=$(git mktree </dev/null)

# Windows (Command Prompt):
git mktree <NUL

# Windows (PowerShell):
$null | git mktree

And adds:

As a historical note, the function now known as repo_read_object_file() was taught the empty tree in 346245a1bb ("hard-code the empty tree object", 2008-02-13, Git v1.5.5-rc0 -- merge), and the function now known as oid_object_info() was taught the empty tree in c4d9986f5f ("sha1_object_info: examine cached_object store too", 2011-02-07, Git v1.7.4.1).


Note, you will see that SHA1 pop up on some GitHub repo when the author wants its first commit to be empty (see blog post "How I initialize my Git repositories"):

$ GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000" GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000" git commit --allow-empty -m 'Initial commit'

Will give you:

Empty tree SHA1

(See the tree SHA1?)

You can even rebase your existing history on top of that empty commit (see "git: how to insert a commit as the first, shifting all the others?")

In both cases, you don't rely on the exact SHA1 value of that empty tree.
You simply follow a best practice, initializing your repo with a first empty commit.


To do that:

git init my_new_repo
cd my_new_repo
git config user.name username
git config user.email email@com

git commit --allow-empty -m "initial empty commit"

That will generate a commit with a SHA1 specific to your repo, username, email, date of creation (meaning the SHA1 of the commit itself will be different every time).
But the tree referenced by that commit will be 4b825dc642cb6eb9a060e54bf8d69288fbee4904, the empty tree SHA1.

git log --pretty=raw

commit 9ed4ff9ac204f20f826ddacc3f85ef7186d6cc14
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904      <====
author VonC <[email protected]> 1381232247 +0200
committer VonC <[email protected]> 1381232247 +0200

    initial empty commit

To show just the tree of a commit (display the commit tree SHA1):

git show --pretty=format:%T 9ed4ff9ac204f20f826ddacc3f85ef7186d6cc14
4b825dc642cb6eb9a060e54bf8d69288fbee4904

If that commit, referencing an empty tree, is indeed your first commit, you can show that empty tree SHA1 with:

git log --pretty=format:%h --reverse | head -1 | xargs git show --pretty=format:%T
4b825dc642cb6eb9a060e54bf8d69288fbee4904

(and that even works on Windows, with Gnu On Windows commands)


As commented below, using git diff <commit> HEAD, this will show all your file in the current branch HEAD:

git diff --name-only 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD

Note: that empty tree value is formally defined in cache.h.

#define EMPTY_TREE_SHA1_HEX \
    "4b825dc642cb6eb9a060e54bf8d69288fbee4904"

Since Git 2.16 (Q1 2018), it is used in a structure which is no longer tied to (only) SHA1, as seen in commit eb0ccfd:

Switch empty tree and blob lookups to use hash abstraction

Switch the uses of empty_tree_oid and empty_blob_oid to use the current_hash abstraction that represents the current hash algorithm in use.

See more at "Why doesn't Git use more modern SHA?": it is SHA-2, since Git 2.19 (Q3 2018)


With Git 2.25 (Q1 2020), tests are preparing for a SHA-2 transition, and is involving the empty tree.

See commit fa26d5e, commit cf02be8, commit 38ee26b, commit 37ab8eb, commit 0370b35, commit 0253e12, commit 45e2ef2, commit 79b0edc, commit 840624f, commit 32a6707, commit 440bf91, commit 0b408ca, commit 2eabd38 (28 Oct 2019), and commit 1bcef51, commit ecde49b (05 Oct 2019) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster -- in commit 28014c1, 10 Nov 2019)

t/oid-info: add empty tree and empty blob values

Signed-off-by: brian m. carlson

The testsuite will eventually learn how to run using an algorithm other than SHA-1. In preparation for this, teach the test_oid family of functions how to look up the empty blob and empty tree values so they can be used.

So t/oid-info/hash-info now includes:

rawsz sha1:20
rawsz sha256:32

hexsz sha1:40
hexsz sha256:64

zero sha1:0000000000000000000000000000000000000000
zero sha256:0000000000000000000000000000000000000000000000000000000000000000

algo sha1:sha1
algo sha256:sha256

empty_blob sha1:e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
empty_blob sha256:473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813

empty_tree sha1:4b825dc642cb6eb9a060e54bf8d69288fbee4904
empty_tree sha256:6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321

The SHA2 "6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321" is the new SHA1 "4b825dc642cb6eb9a060e54bf8d69288fbee4904" empty tree.

Myasthenia answered 19/3, 2012 at 7:35 Comment(12)
@torek: I have added some examples arund the first empty commit best practice to illustrate that empty tree SHA1. – Myasthenia
Well, one of the goals is to use the "empty tree" hash as an argument to git diff-tree in some scripts I'm writing. There's no guarantee that there is an initial empty commit in the repo. So I'm just wondering if these scripts might end up breaking someday. – Hessenassau
If you pass -w to git hash-object, it'll create the object in the repository it's run against, and that would recreate the empty tree in the repository you're running against were it to ever go away in the future. – Quadruplex
If you want to go before the first commit using rebase, you can use git rebase --root – Forecourse
@Forecourse indeed: I mention it in https://mcmap.net/q/11670/-squash-the-first-two-commits-in-git-duplicate, with git 1.7.12+. – Myasthenia
Or if you prefer the magic of pipes instead of the magic of /dev/null: printf '' | git hash-object --stdin -t tree :) – Marguritemargy
I'm not adding much to your answer but just an FYI, an alternative way to derive it would be git mktree < /dev/null. Both git hash-object and this produce the same end result. – Vaccine
@Myasthenia So does this mean if I do a git diff between any branch and that special commit - it will show me what the repo looks like on that branch? – Noshow
@Noshow Yes. A git diff --name-only 4b825dc642cb6eb9a060e54bf8d69288fbee4904 would display the list of all the files present un the current branch HEAD. – Myasthenia
@Myasthenia Ah, thank you. My only problem is that it gives me all of the files that have ever existed in the repo. I only want the files that currently exist. For example if you add a file and then remove it in a future commit - this diff command will still show the file as added. – Noshow
@Noshow (and @VonC): git diff <options> <hash> compares the given hash, which may be the empty tree, to the current work-tree, not to the HEAD commit. You would need git diff <options> <hash> HEAD for the latter. – Hessenassau
@Hessenassau True. I have added the full command in the answer for more visibility. – Myasthenia
C
5

Here is the answer on how to create empty tree commit even in the case when the repository is not already empty. https://mcmap.net/q/12345/-creating-a-git-diff-from-nothing

But I prefer "empty" to be tag, but not a branch. Simple way is:

git tag empty $(git hash-object -t tree /dev/null)

Because tag can point to tree-ish directly, without commit. Now to get all the files in the working tree:

git diff --name-only empty

Or the same with stat:

git diff --stat empty

All files as diff:

git diff empty

Check whitespaces in all files:

git diff --check empty
Cullender answered 1/3, 2019 at 13:17 Comment(8)
...but using the magic number in your tag creation is just brushing under the rug the very matter of the question (not using magic number SHA-1) – Lectionary
Not true. I used tag to point to the tree-ish object. By now this tree-ish is defined by SHA-1, in the future it can be changed, for instance, to SHA-256 and so on (with repository migration). But tag will be the same. :) The main feature of a tag is to point to the object. A tag can use SHA-1 internally or something other, it is a matter of Git internals only. – Cullender
I get that. But if you (or anyone reading this)(or a script, even worse) tries to apply it (your first line) at a later point it could fail on a new hash algorithm, where replacing your first line with an executed expression (producing this hash) would keep succeeding. – Lectionary
If you combine this with one of the methods of generating the empty tree hash automatically, you can future-proof this (as @RomainValeri suggests). However, if it were up to me, git rev-parse would have new flags or keywords or something along those lines, to produce (a) the empty tree hash and (b) the null-commit hash. Both of these would be useful in scripts and would protect against the proposed SHA-256 changes. – Hessenassau
Okey, changed. But this will be not "a simplest way". :) – Cullender
@Cullender Anyway, your tag-based shortcuts seem very practical to me. Upvoted for pragmatism. – Lectionary
@Hessenassau Btw, such flag exists in: git read-tree --empty This will create an empty tree in the stage. But I don't know how this can be useful. – Cullender
@Olleg: yes, the problem is that git read-tree doesn't export it anywhere. Since rev-parse is sort of the Programmer's API to for hash IDs, I think this is where these belong. – Hessenassau
T
4

I wrote up a blog post with two different ways of finding the hash: http://colinschimmelfing.com/blog/gits-empty-tree/

If it were to ever change for some reason, you could use the two ways below to find it. However, I would feel pretty confident using the hash in .bashrc aliases, etc., and I don't think it will change anytime soon. At the very least it would probably be a major release of git.

The two ways are:

  1. The answer above: git hash-object -t tree --stdin < /dev/null
  2. Simply initing an empty repo and then running git write-tree in that new repo - the hash will be output by git write-tree.
Tuyettv answered 6/10, 2013 at 18:24 Comment(4)
Running the command with –-stdin gives me fatal: Cannot open '–-stdin': No such file or directory with git 2.7.2. However, running it without --stdin as in VonC's answer gives the hash value – Josephina
This answer isn't very useful now the blog-post is dead. Hence why we don't generally approve of these answers on SO. – Medic
@PhilipWhitehouse the blog post is not dead, but in any cased I included the two ways in my answer - I agree that without including those two ways, it would not be a good answer. – Tuyettv
ref: git's Empty Tree @@ <web.archive.org/web/20220718025749/http://colinschimmelfing.com/…> – Ari

© 2022 - 2024 β€” McMap. All rights reserved.