git lfs prune to remove files from lfs and push to origin
Asked Answered
M

4

12

So here's what's happened:

  1. Accidentally committed lots of files that weren't meant to be.
  2. Did a git reset --soft HEAD~2 to get back to a commit before the accident
  3. Modified gitignore to ignore the files
  4. Commited again and pushed to origin.

I assumed the git reset would revers everything from the accidental commit, but after checking bitbucket's list of git lfs files, it seems all the lfs tracked files from the accidental commit were pushed to lfs in origin. These files do not exist if I look through the source in bitbucket.

So I tried doing git lfs prune which appeared to delete an amount of files that looks to be about the amount that was accidentally commited, then git lfs push origin master. Checked bitbucket's list of git lfs files again, but those files are still there and nothing's changed in origin.

What have I done wrong?

Metagenesis answered 28/8, 2017 at 17:17 Comment(0)
D
16

There doesn't appear to be a standard way of doing this:

The Git LFS command-line client doesn't support pruning files from the server, so how you delete them depends on your hosting provider.

Bitbucket allows you to delete LFS files using its web UI (please read the entire linked page before proceeding):

Delete individual LFS files from your repository

It's important to understand that:

  • The delete operation described here is destructive – there's no way to recover the LFS files referenced by the deleted LFS pointer files (it's not like the git remove command!) – so you'll want to back up the LFS files first.
  • Deleting an LFS file only deletes it from the remote storage. All reference pointers stored in your Git repo will remain.
  • No branch, tag or revision will be able to reference the LFS files in future. If you attempt to check out a branch, tag or revision that includes a pointer file referencing a deleted LFS file, you'll get a download error and the check out will fail.

A repository admin can delete Git LFS files from a repo as follows:

  1. Go to the Settings page for the repo and click Git LFS to view the list of all LFS files in that repo.
  2. Delete the LFS files using the actions menu.

Surprisingly, the only way to remove LFS files from GitHub appears to be to delete and recreate the repository, losing issues, stars, forks, and possibly other data.

Dyslalia answered 28/8, 2017 at 18:30 Comment(1)
Thanks! I found that page not long after I posted the question and did started using bitbucket's LFS management tool to delete the accidental files, however since I have to git log every file I want to delete to ensure it has no references from any commits, it became clear that this was going to be a very long, manual process. I accidentally committed easily hundreds of files. Sitting here deleting these files 1 by 1 is really not a great solution. I really hope there will be better tools for managing LFS in futureMetagenesis
M
3

In the initial steps you followed, I think you've just stumbled on one of the cases where git / git-lfs integration isn't always perfectly seamless.

The reset command would have moved your branch ref back. It would not have actually removed the unwanted commit (or related objects); but that normally wouldn't matter, because those objects are unreachable so would not be sent with a push. So far so good... with vanilla git.

BUT: The LFS objects (the real content of the large files) also weren't deleted prior to your push. AFAIK (and your experience seems to confirm this) LFS does not attempt to determine if LFS objects are reachable when pushing to the remote - which would, after all, seem to be an expensive check. Given that your LFS store is meant to house a bunch of large binary files, and that LFS is designed to mitigate the costs of having a large volume of unneeded data in the LFS store, the cost-benefit would usually favor just sending anything that's not on the server - which is what apparently happened here.

And unless you're facing a limit on physical storage on the server, that may be ok really. No fetch or pull - short of explicitly telling LFS to send you everything, which is not intended for normal usage - is going to cause those files to be downloaded anyway.

But maybe you're running into a storage limit with your repo host. Or maybe you just want them gone; I can't say I'd blame you. That deleting the files locally and pushing does not result in the files being removed from the server is, again, by design. (The same is true of core git objects; you can force-push a ref to make a remote object unreachable, but physically "cleaning up" the remote is independent of any local clean-up.)

Info on removing LFS files from a bitbucket-hosted repo can be found here: https://www.atlassian.com/git/tutorials/git-lfs#deleting-remote-files

Mehetabel answered 28/8, 2017 at 18:36 Comment(2)
I am actually running out of storage in my bitbucket repo which is the reason I was concerned with space in the first place. My understanding of vanilla git principles is that git has a garbage collector that runs periodically and removes any objects which does not have any references to it anymore. The LFS files most certainly do not have any commits referring to it, so by git principles, those files should be automatically removed, right? I suppose since LFS was designed independent from git itself, the same principles don't always applyMetagenesis
It is correct that git gc should eventually clean up unreachable git objects, but also beware that how this works on hosted remotes varies with the hosting service. In this sense, git lfs prune is similar to gc - it tries to clean up what's unusable locally, but how the clean-up works server side is not so straightforward.Mehetabel
G
2

For BitBucket users, I have a solution for this, that works for me for months already: https://gist.github.com/danielgindi/db0e0a897d8d920f23e155bb5d59e9c6

You basically open Chrome while in the bitbucket repo and logged in, and put that piece of code in the console. It uses your authorization to go and delete all LFS files older than the specified time, and it takes a few seconds.

Important note: Never run any piece of code in the browser blindly. Look at the code, make sure you understand what it does. I can tell you "trust me", but you don't know me.

Gasaway answered 25/2, 2020 at 8:8 Comment(2)
Pasting script code to run in a trusted environment seems like a bad suggestion, even if the code is benign. But also, why is age of the LFS file a useful metric for what to delete? The most important file in a project can be years old. Shouldn't you be removing files that aren't referenced in any recent branch heads, a much more difficult task?Caprifig
In my case - the files are irrelevant after a certain period of time. For checking references it would need a deeper action server-side or locally in the git repo. The issue is that git in general has no option for LFS deletion, so a local script would involve extracting website auth tokens etc.Gasaway
C
1

For a large Unreal Engine project, we've opted to run two repositories, first an outer repository that's just a regular repository without LFS that will never be pruned.

We've then made a submodule out of the Content folder, which uses LFS. This is where Unreal Engine places all large assets.

We can then periodically reset just the repository of the Content folder and push the current state, while keeping full history of the rest of the project, especially the code.

In daily use, this means pushing first the submodule and then the outer module, which we've made a Bash commit script for. But it means that we get to simply not care how much data is put into the Content repository. The history for those assets rarely matters, so we just reset it maybe twice a year.

It's a little bit inconvenient to juggle two repositories. But this balances our needs of (a) staying with Git, (b) not using Perforce and a checkout model, (c) infinite history for the code, and (d) not stressing about how much data is being committed.

Caprifig answered 23/11, 2023 at 17:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.