Note: you can ask git rev-parse --short
for the shortest and yet unique SHA1.
See "git get short hash from regular hash"
git rev-parse --short=4 921103db8259eb9de72f42db8b939895f5651489
92110
As you can see in my example the SHA1 has a length of 5 even if I specified a length of 4.
For big repos, 7 isn't enough since 2010, and commit dce9648 by Linus Torvalds himself (git 1.7.4.4, Oct 2010):
The default of 7 comes from fairly early in git development, when seven hex digits was a lot (it covers about 250+ million hash values).
Back then I thought that 65k revisions was a lot (it was what we were about to hit in BK), and each revision tends to be about 5-10 new objects or so, so a million objects was a big number.
(BK = BitKeeper)
These days, the kernel isn't even the largest git project, and even the kernel has about 220k revisions (much bigger than the BK tree ever was) and we are approaching two million objects.
At that point, seven hex digits is still unique for a lot of them, but when we're talking about just two orders of magnitude difference between number of objects and the hash size, there will be collisions in truncated hash values.
It's no longer even close to unrealistic - it happens all the time.
We should both increase the default abbrev that was unrealistically small, and add a way for people to set their own default per-project in the git config file.
core.abbrev
Set the length object names are abbreviated to.
If unspecified, many commands abbreviate to 7 hexdigits, which may not be enough for abbreviated object names to stay unique for sufficiently long time.
environment.c
:
int minimum_abbrev = 4, default_abbrev = 7;
Note: As commented below by marco.m, core.abbrevLength
was renamed in core.abbrev
in that same Git 1.7.4.4 in commit a71f09f
Rename core.abbrevlength
back to core.abbrev
It corresponds to --abbrev=$n
command line option after all.
More recently, Linus added in commit e6c587c (for Git 2.11, Q4 2016):
(as mentioned in Matthieu Moy's answer)
In fairly early days we somehow decided to abbreviate object names down to 7-hexdigits, but as projects grow, it is becoming more and more likely to see such a short object names made in earlier days and recorded in the log messages no longer unique.
Currently the Linux kernel project needs 11 to 12 hexdigits, while Git itself needs 10 hexdigits to uniquely identify the objects they have, while many smaller projects may still be fine with the original 7-hexdigit default. One-size does not fit all projects.
Introduce a mechanism, where we estimate the number of objects in the repository upon the first request to abbreviate an object name with the default setting and come up with a sane default for the repository. Based on the expectation that we would see collision in a repository with 2^(2N)
objects when using object names shortened to first N bits, use sufficient number of hexdigits to cover the number of objects in the repository.
Each hexdigit (4-bits) we add to the shortened name allows us to have four times (2-bits) as many objects in the repository.
See commit e6c587c (01 Oct 2016) by Linus Torvalds (torvalds
).
See commit 7b5b772, commit 65acfea (01 Oct 2016) by Junio C Hamano (gitster
).
(Merged by Junio C Hamano -- gitster
-- in commit bb188d0, 03 Oct 2016)
That new property (guessing a reasonnable default for SHA1 abbrev value) has a direct effect on how Git compute its own version number for release.