I recently found a posting from 2013-04-29 in a BSD discussion group at
http://openbsd-archive.7691.n7.nabble.com/Why-does-OpenBSD-use-CVS-td226952.html
where the poster claims:
I ran into a hash collision once, using git rebase.
Unfortunately, he provides no proof for his claim. But maybe you would like trying to contact him and ask him about this supposed incident.
But on a more general level, due to the birthday attack a chance for an SHA-1 hash collision is 1 in pow(2, 80).
This sounds a lot and is certainly way more than the total number of versions of individual files present in all Git repositories of the world combined.
However, this only applies to the versions which actually remain in version history.
If a developer relies very much on rebasing, every time a rebase is run for a branch, all the commits in all the versions of that branch (or rebased part of the branch) get new hashes. The same is true for every file modifies with "git filter-branch". Therefore, "rebase" and "filter-branch" might be big multipliers for the number of hashes generated over time, even though not all of them are actually kept: Frequently, after rebasing (especially for the purpose of "cleaning up" a branch), the original branch is thrown away.
But if the collision occurs during the rebase or filter-branch, it can still have adverse effects.
Another thing would be to estimate the total number of hashed entities in git repositories and see how far they are from pow(2, 80).
Let's say we have about 8 billion people, and all of them would be running git and keep their stuff versioned in 100 git repositories per person. Let' further assume the average repository has 100 commits and 10 files, and only one of those files changes per commit.
For every revision we have at least a hash for the tree object and the commit object itself. Together with the changed file we have 3 hashes per revision, and thus 300 hashes per repository.
For 100 repositories of 8 billion people this gives pow(2, 47) which is still far from pow(2, 80).
However, this does not include the supposed multiplication effect mentioned above, because I am uncertain how to include it in this estimation. Maybe it could increase the chances for a collision considerably. Especially if very large repositories which a long commit history (like the Linux Kernel) are rebased by many people for small changes, which nevertheless create different hashes for all affected commits.
I've been informed by the git Gods that the chances of a SHA1 collision is the same as the Earth being sucked up into the black hole created by the CERN accelerator. If this is indeed true, then there's no need for that extra memcmp.
, source: lwn.net/Articles/307281 – Counterchange