Definitive retroactive .gitignore (how to make Git completely/retroactively *forget* about a file now in .gitignore)
Asked Answered
T

2

23

Preface

This question attempts to clear the confusion regarding applying .gitignore retroactively, not just to the present/future.1

Rationale

I've been searching for a way to make my current .gitignore be retroactively enforced, as if I had created .gitignore in the first commit.

The solution I am seeking:

  • Will not require manually specifying files
  • Will not require a commit
  • Will apply retroactively to all commits of all branches
  • Will ignore .gitignore-specified files in working dir, not delete them (just like an originally root-committed .gitignore file would)
  • Will use git, not BFG
  • Will apply to .gitignore exceptions like:
 *.ext
 !*special.ext

Not solutions

git rm --cached *.ext
git commit

This requires 1. manually specifying files and 2. an additional commit, which will result in newly-ignored file deletion when pulled by other developers. (It is effectively just a git rm - which is a deletion from git tracking - but it leaves the file alone in the local (your) working directory. Others who git pull afterwards will receive the file deletion commit)

git filter-branch --index-filter 'git rm --cached *.ext'

While this does purge files retroactively, it 1. requires manually specifying files and 2. deletes the specified files from the local working directory just like plain git rm (and so also for others who git pull)!


Footnotes

1There are many similar posts here on SO, with less-than-specifically-defined questions and even more less-than-accurate answers. See this question with 23 answers where the accepted answer with ~4k votes is incorrect according to the standard definition of "forget" as noted by one mostly-correct answer, and only 2 answers include the required git filter-branch command.

This question with 21 answers is was marked as a duplicate of the previous one, but the question is defined differently (ignore vs forget), so while the answers may be appropriate, it is not a duplicate.

This question is the closest I've found to what I'm looking for, but the answers don't work in all cases (paths with spaces...) and perhaps are a bit more complex than necessary regarding creating an external-to-repository .gitignore file and copying it into every commit.

Tantamount answered 8/8, 2019 at 18:31 Comment(14)
Sometimes it's just better to write a script to do the manual things for you.Gracye
Is your goal to rewrite the repository to how it would look if the files in question were never committed (which would invalidate all existing commit IDs, and probably break things for every existing clone/checkout of the repo), or to configure your local working directory such that Git pretends those files are not present in an old commit when you check it out?Divergence
Goal is the former, "as if I had created .gitignore at the beginning". I understand the ramifications, but my repo is local/private and I don't mind a force-push. Although feel free to specify how to handle the latter if you answer - seems it would be useful information.Tantamount
If you understand the ramification and your repo is local/private, it means that you intend to use a system built with the intention to be shared (or enable collaboration) within a problem that denies the need for collaboration or authorship (that's what you do when you invalidate the IDs ... you basically don't care who did what. If this is onetime thingy, I presume it is fine, but then I would agree @Henning MakholmSwitch
I'm a git noob, and I can't imagine I'm the only one who has started to learn git on a local repo and forgotten to start with a .gitignore - the number of "how to make .gitignore actually ignore..." questions here seems to confirm my suspicion. It seems git needs a "found new changes to gitignore file - would you like to ignore these files 1 - never (default), 2 - present/future, or 3 - past/present/future - warning: don't do this with shared repos that have pull requests" feature. It's clear GitHub Desktop is catering to the noob crowd, so seems not-unreasonable for such a feature to existTantamount
Specific to my scenario, I have need for proper version control of an internal project that only I manage, with no reason to ever be shared with the public. Shared internally at some point, sure, but needs quite a bit of work before then. Prior 'version control' method relies on editor's default "copy old version to .BAK file" behavior and twice-daily Volume Shadow Copy snapshots on windows file server. I want something better.Tantamount
Also, its unclear what you're agreeing with @Henning Makholm about, don't think he stated an opinion.Tantamount
It's pretty unclear to me why you feel a need to rewrite old commits. What will you get out of that, rather than just have things right going forward?Divergence
A "clean" repo from the start without junk/binary files. Same reason anyone creates a .gitignore file I would assume.Tantamount
There are MANY such posts here on SO, with less-than-specifically-defined questions and even more less-than-accurate accepted answers (see this question with 23 answers where the accepted answer (with ~4k votes) is WRONG according to the standard definition of "forget", and only 3 answers mention the required git filter-branch command. Not sure why the downvotes for a specifically defined question designed to eliminate less-than-accurate answers.Tantamount
It's going to be like a five-line filter-branch, tops. Put your exclusions in .git/info/exclude, do a git ls-files --exclude-standard -ci and rm --cached them.Skywriting
thanks for the .git/info/exclude tip!Tantamount
Regarding the edit, I'll give way on the preface, even though I think the wording of "forget" includes "retroactively". But please do not use quote blocks as a general highlighter - quote blocks are for quotes.Flapper
Thank you. I agree that forget=retroactively, and would have no need to specify it explicitly, if not for the other incredibly upvoted “completely forget” question with an accepted answer that only applies to the present/future. Perhaps that question should be edited to be more explicit (present/future only) as well?Tantamount
R
6

This may be only a partial answer but here is how I accomplished retroactively removing files from previous git commits based on my current .gitignore file:

  1. Make a backup of the repo folder you are working on. I just made a .7z archive of the entire folder.
  2. Install git-filter-repo
  3. Copy your .gitignore file somewhere else temporarily. Since I'm on Windows and using Command Prompt, I ran copy .gitignore ..\ and just made the temp copy only directory level up
  4. If your .gitignore file has wildcard filters (like nbproject/Makefile-*), you'll need to edit your temp copied .gitignore file so those lines read glob:nbproject/Makefile-*
  5. Run git filter-repo --invert-paths --paths-from-file ..\.gitignore. My understanding is that this uses the temp copy as a list of files/directories to remove. Note: if you receive an error regarding your repo not being a clean clone, search for "FRESH CLONE SAFETY CHECK AND --FORCE" in the git-filter-repo help. Be careful.

For more info see: git-filter-repo help (Search for "Filtering based on many paths")

Disclaimer: I have no idea what I'm doing but this worked for me.

Roslynrosmarin answered 19/4, 2022 at 2:8 Comment(0)
T
12

EDIT: I've recently found git-filter-repo. It may be a better choice. Perhaps a good idea to investigate the rationale and filter-branch gotchas for yourself, but they wouldn't have affected my use-case below.


This method makes Git completely forget ignored files (past/present/future), but does not delete anything from working directory (even when re-pulled from remote).

This method requires usage of /.git/info/exclude (preferred) OR a pre-existing .gitignore in all the commits that have files to be ignored/forgotten. 1

This method avoids removing the newly-ignored files from other developers machines on the next git pull 2

All methods of enforcing Git ignore behavior after-the-fact effectively re-write history and thus have significant ramifications for any public/shared/collaborative repos that might be pulled after this process. 3

General advice: start with a clean repo - everything committed, nothing pending in working directory or index, and make a backup!

Also, the comments/revision history of this answer (and revision history of this question) may be useful/enlightening.

#commit up-to-date .gitignore (if not already existing)
#these commands must be run on each branch
#these commands are not strictly necessary if you don't want/need a .gitignore file.  .git/info/exclude can be used instead

git add .gitignore
git commit -m "Create .gitignore"

#apply standard git ignore behavior only to current index, not working directory (--cached)
#if this command returns nothing, ensure /.git/info/exclude AND/OR .gitignore exist
#this command must be run on each branch
#if using .git/info/exclude, it will need to be modified per branch run, if the branches have differing (per-branch) .gitignore requirements.

git ls-files -z --ignored --exclude-standard | xargs -r0 git rm --cached

#Commit to prevent working directory data loss!
#this commit will be automatically deleted by the --prune-empty flag in the following command
#this command must be run on each branch
#optionally use the --amend flag to merge this commit with the previous one instead of creating 2 commits.

git commit -m "ignored index"

#Apply standard git ignore behavior RETROACTIVELY to all commits from all branches (--all)
#This step WILL delete ignored files from working directory UNLESS they have been dereferenced from the index by the commit above
#This step will also delete any "empty" commits.  If deliberate "empty" commits should be kept, remove --prune-empty and instead run git reset HEAD^ immediately after this command

git filter-branch --tree-filter 'git ls-files -z --ignored --exclude-standard | xargs -r0 git rm -f --ignore-unmatch' --prune-empty --tag-name-filter cat -- --all

#List all still-existing files that are now ignored properly
#if this command returns nothing, it's time to restore from backup and start over
#this command must be run on each branch

git ls-files --other --ignored --exclude-standard

Finally, follow the rest of this GitHub guide (starting at step 6) which includes important warnings/information about the commands below.

git push origin --force --all
git push origin --force --tags
git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now

Other devs that pull from now-modified remote repo should make a backup and then:

#fetch modified remote

git fetch --all

#"Pull" changes WITHOUT deleting newly-ignored files from working directory
#This will overwrite local tracked files with remote - ensure any local modifications are backed-up/stashed

git reset FETCH_HEAD

Footnotes

1 Because /.git/info/exclude can be applied to all historical commits using the instructions above, perhaps details about getting a .gitignore file into the historical commit(s) that need it is beyond the scope of this answer. I wanted a proper .gitignore to be in the root commit, as if it was the first thing I did. Others may not care since /.git/info/exclude can accomplish the same thing regardless where the .gitignore exists in the commit history, and clearly re-writing history is a very touchy subject, even when aware of the ramifications.

FWIW, potential methods may include git rebase or a git filter-branch that copies an external .gitignore into each commit, like the answers to this question

2 Enforcing git ignore behavior after-the-fact by committing the results of a standalone git rm --cached command may result in newly-ignored file deletion in future pulls from the force-pushed remote. The --prune-empty flag in the git filter-branch command (or git reset HEAD^ afterwards) avoids this problem by automatically removing the previous "delete all ignored files" index-only commit.

3 Re-writing git history also changes commit hashes, which will wreak havoc on future pulls from public/shared/collaborative repos. Please understand the ramifications fully before doing this to such a repo. This GitHub guide specifies the following:

Tell your collaborators to rebase, not merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.

Alternative solutions that do not affect the remote repo are git update-index --assume-unchanged </path/file> or git update-index --skip-worktree <file>, examples of which can be found here.

Tantamount answered 11/8, 2019 at 22:53 Comment(5)
I had hoped to use git filter-branch --index-filter 'git ls-files -z --ignored --exclude-from=.gitignore | xargs -0 git rm --cached --ignore-unmatch' --prune-empty --tag-name-filter cat -- --all instead, but got fatal: cannot use .gitignore as an exclude fileTantamount
Couple potential .gitignore "injection" solutions - git rebase..., or git-filter-branch that copies external .gitignore into every commit like the answers to this questionTantamount
Unfortunately this doesn't work in Windows because of xargs.Manton
This was developed/tested on windows Git - use it in Cygwin (Git Bash), not CMD.Tantamount
If you just want to delete this eclipse .project file that you accidentally committed very recently, then git-filter-repo is ridiculously oversized for the task. I ran git-filter-repo --analyze, opened the folder where it wrote the analysis result, and I am absolutely, completely intimidated. Using git-filter-repo looks an order of magnitude harder than doing the few required actions completely by hand, as in checkout master, checkout -b feex, fix .gitignore, checkout topic, rebase -i feex.Stewpan
R
6

This may be only a partial answer but here is how I accomplished retroactively removing files from previous git commits based on my current .gitignore file:

  1. Make a backup of the repo folder you are working on. I just made a .7z archive of the entire folder.
  2. Install git-filter-repo
  3. Copy your .gitignore file somewhere else temporarily. Since I'm on Windows and using Command Prompt, I ran copy .gitignore ..\ and just made the temp copy only directory level up
  4. If your .gitignore file has wildcard filters (like nbproject/Makefile-*), you'll need to edit your temp copied .gitignore file so those lines read glob:nbproject/Makefile-*
  5. Run git filter-repo --invert-paths --paths-from-file ..\.gitignore. My understanding is that this uses the temp copy as a list of files/directories to remove. Note: if you receive an error regarding your repo not being a clean clone, search for "FRESH CLONE SAFETY CHECK AND --FORCE" in the git-filter-repo help. Be careful.

For more info see: git-filter-repo help (Search for "Filtering based on many paths")

Disclaimer: I have no idea what I'm doing but this worked for me.

Roslynrosmarin answered 19/4, 2022 at 2:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.