How can I trigger garbage collection on a Git remote repository?
Asked Answered
L

4

70

As we know, we can periodically run git gc to pack objects under .git/objects.

In the case of a remote central Git repository (bare or not), though, after many pushes, there many files under myproj.git/objects; each commit seems to create a new file there.

How can I pack that many files? (I mean the ones on the remote central bare repository, not on local clone repository.)

Lubbi answered 2/7, 2010 at 1:43 Comment(1)
see also stackoverflow.com/questions/3532740/…Mesencephalon
U
53

The remote repo should be configured to run gc as needed after a commit is made. See the documentation of gc.auto in git-gc and git-config man pages.

However, a remote repo shouldn't need all that much garbage collection, since it will rarely have dangling (unreachable) commits. These usually result from things like branch deletion and rebasing, which typically happen only in local repos.

So gc is needed more for repacking, which is for saving storage space rather than removing actual garbage. The gc.auto variable is sufficient for taking care of this.

Unreadable answered 2/7, 2010 at 5:35 Comment(4)
Not necessarily. If we pushed a bugfix branch for several people to collaborate on it, then do a rebase when merging to master, we are essentially rebasing in remote. Even if we don't perform rebasing, the remote repo should still pack things up frequently so that new folks can Clone much faster.Riemann
@Ryuu, yes, you make a good point. This is one of the ways in which a remote repo could end up with loose objects. However, I did say "typically" when talking about rebasing. Maybe this scenario is becoming more common as people's use of git becomes more sophisticated. Even so, automatic garbage collection should take care of it eventually.Unreadable
I think loose commits also result when doing a force push to overwrite the last commit push (?), but I could be wrong. Any thoughts?Leesa
@Sнаđошƒаӽ Yes, that creates loose objects. It comes under the heading of rebasing, and although that's not as common in remote repos, it still happens, especially with a pull-request workflow or any other workflow that allows users to have private branches on a remote repo that they can modify in a non fast-forward way. When I wrote this in 2010 GitHub was not as dominant as it is now.Unreadable
C
14

While you should have some process that takes care of this periodically, automatically, it's no problem run

git gc

on a bare repository

git@domU:/pix/git/repositories/abd.git$ ls -l

total 28
drwxrwxr-x   2 git git    6 2010-06-06 02:44 branches
-rw-rw-r--   1 git git   66 2010-06-06 02:44 config
-rw-r--r--   1 git git   23 2011-03-15 18:19 description
-rw-rw-r--   1 git git   23 2010-06-06 02:44 HEAD
drwxrwxr-x   2 git git 4096 2010-06-06 02:44 hooks
drwxrwxr-x   2 git git   20 2010-06-06 02:44 info
drwxrwxr-x 260 git git 8192 2010-09-01 00:26 objects
drwxrwxr-x   4 git git   29 2010-06-06 02:44 refs

$ git gc
Counting objects: 3833, done.
Compressing objects:  31% (1085/3500)...
Camelliacamelopard answered 13/6, 2011 at 0:23 Comment(1)
He is asking how to do this remotely, on GitHubEasily
H
6

after many pushes, there many files under myproj.git/objects

There won't be as much with git 2.11+ (Q4 2016) and a pre-receive hook.
In that scenario, you won't have to trigger a git gc at all.

See commit 62fe0eb, commit e34c2e0, commit 722ff7f, commit 2564d99, commit 526f108 (03 Oct 2016) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 25ab004, 17 Oct 2016)

receive-pack: quarantine objects until pre-receive accepts

In order for the receiving end of "git push" to inspect the received history and decide to reject the push, the objects sent from the sending end need to be made available to the hook and the mechanism for the connectivity check, and this was done traditionally by storing the objects in the receiving repository and letting "git gc" to expire it.

Instead, store the newly received objects in a temporary area, and make them available by reusing the alternate object store mechanism to them only while we decide if we accept the check, and once we decide, either migrate them to the repository or purge them immediately.

That temporary area will be set by the new environment variable GIT_QUARANTINE_ENVIRONMENT.

That way, if a (big) push is rejected by a pre-receive hook, those big objects won't be laying around for 90 days waiting for git gc to clean them up.

Hardman answered 25/10, 2016 at 6:13 Comment(0)
U
2

This question should shed some light on how often you should run garbage collection.

The easiest option would be to use a scheduled task in windows or a cron job in Unix to run git gc periodically. This way you don't even need to think about it.

Underpants answered 2/7, 2010 at 1:54 Comment(2)
Thanks for your suggestions, but my question is how to run `git gc' on a remote bare repository, not on a local cloned repository.Lubbi
Pretty sure you can't invoke git gc remotely, that's why you have to schedule it on the machine containing the bare repository.Underpants

© 2022 - 2024 — McMap. All rights reserved.