Git pack filenames -- what is the digest?
Asked Answered
D

2

9

Git stores individual objects in .git/objects/ab/cdefgh... where ab is the first byte of the SHA1 digest.

However, pack files don't follow the same naming policy, and I can find no documentation on how it is named. Any insights?

Disembark answered 29/3, 2011 at 8:47 Comment(0)
F
11

The pack files are kept in objects/pack, which is documented in gitrepository layout. Within this directory, they are stored as pairs of an index file and the pack file itself, called, for example:

pack-a862cfa8b080773290073999c800a2e655ef9b5d.idx
pack-a862cfa8b080773290073999c800a2e655ef9b5d.pack

How the SHA1sum in those filenames is calculated is explained in the git-pack-objects documentation (my emphasis):

Write into a pair of files (.pack and .idx), using <base-name> to determine the name of the created file. When this option is used, the two files are written in <base-name>-<SHA1>.<pack,idx> files. <SHA1> is a hash of the sorted object names to make the resulting filename based on the pack content, and written to the standard output of the command.

The object names are the SHA1sums of the objects within the pack file.

Fanfaronade answered 29/3, 2011 at 9:7 Comment(2)
Some filesystems perform poorly with a large number of entries in a single directory. The loose objects “fan out” into objects/[0-9a-f][0-9a-f]/ to limit the number of entries each directory has to hold. If the loose objects were stored directly under objects/, it could end up with many thousands of entires before being automatically packed. objects/pack/ does not fan out because it typically needs to store only a very small number of files.Towardly
This was correct at the time of writing but changed in 2013 (github.com/git/git/commit/1190a1ac / github.com/git/git/commit/40a4f5a7). It is now the trailer checksum (equal to the hexadecimal of the last 20 bytes of the file).Larrup
L
7

The answer is either "the SHA-1 hash of the entire pack file, minus the last 20 bytes", or "a hexadecimal digest of the last 20 bytes" (both are equivalent).

The last 20 bytes of the file is the "trailer checksum" which itself is a SHA-1 hash of the entirety of the file (minus the last 20 bytes).

This was changed in 2013 (previously it was the SHA-1 sum of all the hashes in the file). Note that the documentation now simply reads " is a hash based on the pack content". The author explicitly does not guarantee how the SHA-1 is calculated (from the commit log: "Hopefully this will discourage readers from depending on the old or the new calculation.").

Larrup answered 1/4, 2016 at 10:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.