S

14

159

145M = .git/objects/pack/

I wrote a script to add up the sizes of differences of each commit and the commit before it going backwards from the tip of each branch. I get 129MB, which is without compression and without accounting for same files across branches and common history among branches.

Git takes all those things into account so I would expect much much smaller repository. So why is .git so big?

I've done:

git fsck --full
git gc --prune=today --aggressive
git repack

To answer about how many files/commits, I have 19 branches about 40 files in each. 287 commits, found using:

git log --oneline --all|wc -l

It should not be taking 10's of megabytes to store information about this.

Sialagogue answered 22/6, 2009 at 23:52 Comment(5)

Linus recommends the following over aggressive gc. Does it make a significant difference? git repack -a -d --depth=250 --window=250 – Teage 23/6, 2009 at 1:18

thanks gbacon, but no difference. – Sialagogue 23/6, 2009 at 1:21

That's because you are missing the -f. metalinguist.wordpress.com/2007/12/06/… – Chard 9/1, 2014 at 6:47

git repack -a -d shrunk my 956MB repo to 250MB. Great success! Thanks! – Ulrick 24/5, 2015 at 12:10

One caveat I found was that if you have git submodules, then the .git repo of the submodules show up in the super module's .git directory, so du may be misleading about the super module being large, when it is in fact a submodule and the answers below need to be run in the submodule directory. – Thompkins 12/1, 2021 at 22:21

C

72

I recently pulled the wrong remote repository into the local one (git remote add ... and git remote update). After deleting the unwanted remote ref, branches and tags I still had 1.4GB (!) of wasted space in my repository. I was only able to get rid of this by cloning it with git clone file:///path/to/repository. Note that the file:// makes a world of difference when cloning a local repository - only the referenced objects are copied across, not the whole directory structure.

Edit: Here's Ian's one liner for recreating all branches in the new repo:

d1=#original repo
d2=#new repo (must already exist)
cd $d1
for b in $(git branch | cut -c 3-)
do
    git checkout $b
    x=$(git rev-parse HEAD)
    cd $d2
    git checkout -b $b $x
    cd $d1
done

Comprehensible answered 24/6, 2009 at 4:40 Comment(8)

wow. THANK YOU. .git = 15M now!! after cloning, here is a little 1 liner for preserving your previous branches. d1=#original repo; d2=#new repo; cd $d1; for b in $(git branch | cut -c 3-); do git checkout $b; x=$(git rev-parse HEAD); cd $d2; git checkout -b $b $x; cd $d1; done – Sialagogue 25/6, 2009 at 0:31

if you check this, you could add the 1 liner to your answer so its formatted as code. – Sialagogue 25/6, 2009 at 0:36

I foolishly added a bunch of video files to my repo, and had to reset --soft HEAD^ and recommit. The .git/objects dir was huge after that, and this was the only way that got it back down. However I didn't like the way the one liner changed my branch names around (it showed origin/branchname instead of just branchname). So I went a step further and executed some sketchy surgery--I deleted the .git/objects directory from the original, and put in the one from the clone. That did the trick, leaving all of the original branches, refs, etc intact, and everything seems to work (crossing fingers). – Pyonephritis 4/1, 2011 at 12:1

thanks for the tip about the file:// clone, that did the trick for me – Donohoe 2/4, 2012 at 4:32

Be careful, git just links to the original when cloning locally (to save space, why have the same stuff twice?). Yes, you get a small clone; no, you can not delete the original, that would break the clone. – Teraterai 22/3, 2013 at 19:56

@Teraterai if you hard link to a file and delete the original file, nothing happens except that a reference counter gets decremented from 2 to 1. Only if that counter gets decremented to 0 the space is freed for other files on the fs. So no, even if the files were hard linked nothing would happen if the original gets deleted. – Pepe 29/3, 2013 at 11:35

@IanKelling please add that the new repo dir should already exist. I just messed up my repo because directory #2 didn't exist... – Pliers 10/12, 2014 at 14:31

OMGolly! Not sure why this worked but this is fantastic. – Vikiviking 15/10, 2019 at 16:36

V

186

Some scripts I use:

git-fatfiles

git rev-list --all --objects | \
    sed -n $(git rev-list --objects --all | \
    cut -f1 -d' ' | \
    git cat-file --batch-check | \
    grep blob | \
    sort -n -k 3 | \
    tail -n40 | \
    while read hash type size; do 
         echo -n "-e s/$hash/$size/p ";
    done) | \
    sort -n -k1

...
89076 images/screenshots/properties.png
103472 images/screenshots/signals.png
9434202 video/parasite-intro.avi

If you want more lines, see also Perl version in a neighbouring answer: https://mcmap.net/q/12766/-why-is-my-git-repository-so-big

git-eradicate (for `video/parasite.avi`):

git filter-branch -f  --index-filter \
    'git rm --force --cached --ignore-unmatch video/parasite-intro.avi' \
     -- --all
rm -Rf .git/refs/original && \
    git reflog expire --expire=now --all && \
    git gc --aggressive && \
    git prune

Note: the second script is designed to remove info from Git completely (including all info from reflogs). Use with caution.

Voncile answered 15/1, 2013 at 1:52 Comment(11)

Finally... Ironically I saw this answer earlier in my search but it looked too complicated...after trying other things, this one started to make sense and voila! – Lamar 29/9, 2014 at 4:27

@msanteler, The former (git-fatfiles) script has emerged when I asked the question on IRC (Freenode/#git). I saved the best version to a file, then posted it as an answer here. (I can't the original author in IRC logs although). – Voncile 30/9, 2014 at 22:4

This works very well initially. But when I fetch or pull from the remote again, it just copies all the big files back into the archive. How do I prevent that? – Loculus 19/10, 2015 at 13:7

@felbo, Then the problem is probably not just in your local repository, but in other repositories as well. Maybe you need to do the procedure everywhere, or force everybody abandon original branches and switch to rewritten branches. It is not easy in a big team and needs cooperation between developers and/or manager intervention. Sometimes just leaving the loadstone inside can be better option. – Voncile 19/10, 2015 at 15:12

This function is great, but it's unimaginably slow. It can't even finish on my computer if I remove the 40 line limit. FYI, I just added an answer with a more efficient version of this function. Check it out if you want to use this logic on a big repository, or if you want to see the sizes summed per file or per folder. – Quixote 28/7, 2017 at 7:59

I've committed a 10Mb image, noticed the mess, resized to 100Kb and committed again with same name. Your script for listing fat-files now lists two files with same name. When using filter-branch, how does it know which one to delete? – Therewith 25/12, 2018 at 13:46

@yellow01, You'll need more advanced solution. Or filter branch starting from the commit where you had the image removed (then rebase the rest on top of it). – Voncile 26/12, 2018 at 9:18

How could I use that script? command? - if thats a terminal command, then it did nothing in my case. – Xeniaxeno 8/5, 2019 at 6:30

The fastest (and easiest) way to clean up a bloated GIT history is to use the BFG (rtyley.github.io/bfg-repo-cleaner) – Scheider 5/4, 2020 at 4:59

This worked for me. @Scheider thanks for the link for the BFG as well. – Digiacomo 4/4, 2022 at 12:54

how to execute that script?? – Wifely 2/1, 2023 at 3:56

C

72

I recently pulled the wrong remote repository into the local one (git remote add ... and git remote update). After deleting the unwanted remote ref, branches and tags I still had 1.4GB (!) of wasted space in my repository. I was only able to get rid of this by cloning it with git clone file:///path/to/repository. Note that the file:// makes a world of difference when cloning a local repository - only the referenced objects are copied across, not the whole directory structure.

Edit: Here's Ian's one liner for recreating all branches in the new repo:

d1=#original repo
d2=#new repo (must already exist)
cd $d1
for b in $(git branch | cut -c 3-)
do
    git checkout $b
    x=$(git rev-parse HEAD)
    cd $d2
    git checkout -b $b $x
    cd $d1
done