Will Git GC eventually free up the space taken by outdated LFS object?
Asked Answered
D

2

5

I have a situation, which can be expressed elegantly and exactly by the following SO comment:

I am actually running out of storage in my bitbucket repo which is the reason I was concerned with space in the first place. My understanding of vanilla git principles is that git has a garbage collector that runs periodically and removes any objects which does not have any references to it anymore. The LFS files most certainly do not have any commits referring to it, so by git principles, those files should be automatically removed, right?

So, is it true that the the space for old, outdated LFS files that are no longer in the local repo will eventually be claimed by Git GC, some day, some how? Which means that if I wait for long enough, I will no longer "running out of storage" because GC has freed up the space?

My host is Bitbucket, if that matters.

Donaldson answered 16/8, 2022 at 12:28 Comment(1)
LFS files aren't stored in Git at all (Git stores, instead, "pointer" files: the LFS add-on hides the real files from Git, swapping in fake pointer files, then swapping in the real files later, also behind Git's back). So git gc has no effect on them. You might consider asking the LFS people (see git-lfs) how to clean up old LFS files.Ovoviviparous
M
6

Due to the shared nature of the storage host for LFS, the storage host cannot ever know when it is safe to delete files. Therefore, you must manually tell the storage host which files to delete.

Normally, git can safely delete files it no longer references because the repo is self contained, having everything it needs to switch to any given branch or commit. If git cannot reach that commit any more, then there is no way you would ever need a file referenced only by that commit. That is why git is able to safely delete files. If the file is referenced by a commit in another repo, then that repo must have a copy the file, which it can push whenever it pushes the commit to a remote. By using LFS, the repo is no longer self-contained. Some files are now stored in the LFS storage host rather than the repo itself. The files are instead stored as references in the repo and fetched on demand from the storage host (using caching so as to not need the fetch the files from the storage host every time).

git is a distributed SCM. This makes it very hard / impossible to know of all the various clones of the repo that exist. As such, the storage host for LFS can never know of all the commits that exist on all clones of the repo, and therefore when it is safe to delete a file. The best that LFS can do is prune local copies of LFS files from your local machine.

You could provide a set of rules for reasonable deletion. For example, if it has been at least a month since the file was last referenced by any commit in the origin repository, then the file will be pruned. However, you could still end up with local repositories that have hanging references to files that no longer exist on the storage host. But that is a risk you have to take for any deletion of a file from the storage host.

The following is a somewhat contrived example, but it shows why it is hard to know when it is a safe to delete a file from the LFS storage host. Let us say that someone goes on parental leave. In their local repo is a branch that has since been deleted on the origin repo. A month passes and the file on the storage host is pruned as it is not referenced in the origin repo. The person returns and decides to checkout the given branch (perhaps for the first time). Their local repo does not have a copy of the file referenced by the LFS reference. It asks the storage host for a copy of the file and then finds it does not exist any more. There is nothing that can be done, that branch/commit is now broken forever.

From git LFS v2.4, you can use git lfs ls-files --all to list all LFS files reachable from your current repository. That is, all files reachable from all reachable commits, not just from a single given commit. This should help with figuring out which files might be safe to delete.

Meekins answered 16/8, 2022 at 14:27 Comment(0)
B
3

It seems that you have to delete them manually

The Git LFS command-line client doesn't support pruning files from the server, so how you delete them depends on your hosting provider.

In Bitbucket Cloud, you can view and delete Git LFS files via Repository Settings > Git LFS:

...

Branch answered 16/8, 2022 at 12:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.