How to cleanup garbage in remote git repo
Asked Answered
D

3

41

I recently ran into a size limit with my Bitbucket repo. I followed the countless other questions answering how to clean up your git repo and ended up using BFG to remove some bad commits.

This worked great, however, I noticed that after running a git count, there was a large amount of space sitting in garbage. So I ran a simple git gc. However, that did nothing to clean up the garbage.

After some digging I found the following command:

git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 \
-c gc.rerereunresolved=0 -c gc.pruneExpire=now gc "$@"

Running this led to the garbage being cleaned up locally. However, I still have the issue of the remote repo. Do I now need to get Bitbucket to run this command on my remote repo, or is there a way to push this change to the repository?

Dead answered 9/1, 2015 at 19:28 Comment(1)
I can't speak for bitbucket, but in general, bare clones don't keep reflogs; there would not be any rerere data either. Generally you just need to update the refs on the remote and trigger a gc there.Sing
D
11

If anyone else is experiencing this, the answer turned out to be yes.

Bitbucket support ran the following:

git reflog expire --expire="1 hour" --all
git reflog expire --expire-unreachable="1 hour" --all
git prune --expire="1 hour" -v
git gc --aggressive --prune="1 hour"

The before and after reduced the remote repo size from over 2GB, to under 1GB.

Dead answered 9/1, 2015 at 20:8 Comment(0)
D
59

We think we had the same problem today and were able to solve it without contacting Bitbucket support, as below. Note that the method discards last commit from the repo - so you probably want to have its backup.

Bitbucket reported that our repo was about 2.1GB, while when cloned, it only took about 250MB locally. From this, we concluded that it's most likely from big files in unreachable commits (thanks to this answer).

This is how to see unreachable commits locally, where we don't take into account reachability via reflog:

git fsck --unreachable --no-reflog

Locally, unreachable commits can be cleaned with:

git reflog expire --expire-unreachable="now" --all
git prune --expire="now" -v
git gc --aggressive --prune="now"

We cannot however run any of these commands remotely on Bitbucket. But, they say on the page about reducing repo size (section Remove the repository limitation) that they run git gc themselves in response to doing git reset --hard HEAD~1 (which discards last commit) followed by git push -f. Also, they say in the section Garbage collecting dead data that one can try the sequence: git reflog expire --expire=now --allgit gc --prune=nowgit push --all --force. Given all this, I decided to try the following locally, hoping it'd cut out the reflog and do a prune locally, and then push them to remote Bitbucket repository, on which it'd start a gc:

git reflog expire --expire-unreachable="30m" --all
git prune --expire="30m" -v
git gc --prune="30m"
git reset --hard HEAD~1
git push -f

This worked, repo size immediately went from 2.1GB to ca. 250MB. :)

Note that the time param to expire / expire-unreachable / prune sets the expiration cut-off point measuring from now back. So e.g. "now" means expire / prune everything, and "30m" means except for changes in last 30 minutes.

Donoho answered 16/5, 2016 at 11:48 Comment(7)
the git reset --hard HEAD~1 and git push -f trick is gold dust, thanks!Newport
Does this work on any branch or do you have to do the reset on master?Whilst
Instead of having to backup the head commit, couldn't you intentionally create a noise commit, push that, and then do the HEAD~1 trick to wipe out that unwanted noise commit? That seems much safer & simpler.Phenyl
Didn't work immediately for me in bitbucket cloud. I did this, and after I opened a ticket, and they activated GC on the server. I've read that before they were calling GC after every push, but now they only do it sometimes, so you need to open a ticket if you are close to the limitJillayne
Note that 30m doesn't seem to mean 30 minutes, but 30 months. I tried pruning some ten-minute-old objects with gc --prune=1m and they would not go away. They went away with 0m.Eleanor
@JozefLegény I did it on a new branch, it worked. It allows to create a dummy commit on a dummy branch and not impact colleagues :)Tweet
Please vote for feature to be able to do this manually from the web UI if you don't want to create a support request for this every time: jira.atlassian.com/browse/BCLOUD-19771Amphitrite
D
11

If anyone else is experiencing this, the answer turned out to be yes.

Bitbucket support ran the following:

git reflog expire --expire="1 hour" --all
git reflog expire --expire-unreachable="1 hour" --all
git prune --expire="1 hour" -v
git gc --aggressive --prune="1 hour"

The before and after reduced the remote repo size from over 2GB, to under 1GB.

Dead answered 9/1, 2015 at 20:8 Comment(0)
C
0

I came across this article while searching for a solution to the oversized Bitbucket repository issue.

I found a Bitbucket post detailing how to trigger automatic garbage collection, published in November 2022: https://bitbucket.org/blog/updated-repository-limits-and-automatic-garbage-collection.

In essence, the post confirms that executing git reset --hard HEAD~1 followed by git push --force now automatically triggers a full garbage collection, as previously discussed here.

Carlsen answered 21/3 at 4:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.