How do you fix a bad merge, and replay your good commits onto a fixed merge?
Asked Answered
A

12

438

I accidentally committed an unwanted file (filename.orig while resolving a merge) to my repository several commits ago, without me noticing it until now. I want to completely delete the file from the repository history.

Is it possible to rewrite the change history such that filename.orig was never added to the repository in the first place?

Ant answered 21/11, 2008 at 4:11 Comment(2)
Related: How to remove/delete a large file from commit history in Git repository?.Taite
related help.github.com/articles/…Ledezma
B
311

Please don't use this recipe if your situation is not the one described in the question. This recipe is for fixing a bad merge, and replaying your good commits onto a fixed merge.

Although filter-branch will do what you want, it is quite a complex command and I would probably choose to do this with git rebase. It's probably a personal preference. filter-branch can do it in a single, slightly more complex command, whereas the rebase solution is performing the equivalent logical operations one step at a time.

Try the following recipe:

# create and check out a temporary branch at the location of the bad merge
git checkout -b tmpfix <sha1-of-merge>

# remove the incorrectly added file
git rm somefile.orig

# commit the amended merge
git commit --amend

# go back to the master branch
git checkout master

# replant the master branch onto the corrected merge
git rebase tmpfix

# delete the temporary branch
git branch -d tmpfix

(Note that you don't actually need a temporary branch, you can do this with a 'detached HEAD', but you need to take a note of the commit id generated by the git commit --amend step to supply to the git rebase command rather than using the temporary branch name.)

Buchanan answered 21/11, 2008 at 13:2 Comment(11)
Wouldn't a git rebase -i be faster and still as easy? $ git rebase -i <sh1-of-merge> Mark the correct one as "edit" $ git rm somefile.orig $ git commit --amend $ git rebase --continue However for some reason I still have that file somewhere the last time I did that. Probably missing something.Minorite
git rebase -i is very useful, especially when you have multiple rebase-y operations to perform, but it's a right pain to describe accurately when you're not actually pointing over someone's shoulder and can see what they're doing with their editor. I use vim, but not everyone would be happy with: "ggjcesquash<Esc>jddjp:wq" and instructions like "Move the top line to after the current second line and change the first word on line four to 'edit' now save and quit" quickly seem more complex than the actual steps are. You normally end up with some --amend and --continue actions, as well.Buchanan
I did this but a new commit was reapplied on top of the amended one, with the same message. Apparently git did a 3 way merge between the old, unamended commit containing the unwanted file, and the fixed commit from the other branch, and so it created a new commit on top of the old one, to re-apply the file.Crusted
I tried to do this when I realized an OS X .DS_Store was added but 1) the commit where it was added was duplicated by the rebase 2) Even with a rebase --onto (already complicating the recipe), of course more recent versions had a changed .DS_Store, so filter-branch ended up doing the trick for meOsteopath
@UncleCJ: Was your file added in a merge commit? This is important. This recipe is designed to cope with a bad merge commit. It's not going to work if your unwanted file was added in a normal commit in history.Buchanan
I'm amazed how I could do all this using smartgit and no terminal at all! Thanks for the recipe!Blinni
@CharlesBailey I used your instructions to alter the history of a local branch (i.e. my files didn't come from a merge) and it worked great. FYI.Unavoidable
FYI: Doing this for large files, the rebase can take quite a while. Looks frozen, but it eventually works.Crouch
This was way more involved than I expected. I ended up having to repeatedly delete the file (in each commit) then git add --all :/ followed by git rebase --continue (as it climbed its way through each commit). If that didn't work, then git rebase --skip. Then finally, manually fix the <<<<< conflicts I found in the files. It seems to have taken, but the .git folder is still enormous. I'm not totally sure what I even did, to be honest, but it seems a step in the right direction. (I just realized I deleted the files via Windows Explorer, NOT with git rm. Not sure the consequences...)Crouch
I accidentally added a HUGE file one day. I ended up just nuking my .git dir, re-adding files, and starting the git repo fresh.Contreras
This is pretty cool. I never fully understood what git rebase does. Now I do! Thank you.Palmetto
T
235

Intro: You Have 5 Solutions Available

The original poster states:

I accidentally committed an unwanted file...to my repository several commits ago...I want to completely delete the file from the repository history.

Is it possible to rewrite the change history such that filename.orig was never added to the repository in the first place?

There are many different ways to remove the history of a file completely from git:

  1. Amending commits.
  2. Hard resets (possibly plus a rebase).
  3. Non-interactive rebase.
  4. Interactive rebases.
  5. Filtering branches.

In the case of the original poster, amending the commit isn't really an option by itself, since he made several additional commits afterwards, but for the sake of completeness, I will also explain how to do it, for anyone else who justs wants to amend their previous commit.

Note that all of these solutions involve altering/re-writing history/commits in one way another, so anyone with old copies of the commits will have to do extra work to re-sync their history with the new history.


Solution 1: Amending Commits

If you accidentally made a change (such as adding a file) in your previous commit, and you don't want the history of that change to exist anymore, then you can simply amend the previous commit to remove the file from it:

git rm <file>
git commit --amend --no-edit

Solution 2: Hard Reset (Possibly Plus a Rebase)

Like solution #1, if you just want to get rid of your previous commit, then you also have the option of simply doing a hard reset to its parent:

git reset --hard HEAD^

That command will hard-reset your branch to the previous 1st parent commit.

However, if, like the original poster, you've made several commits after the commit you want to undo the change to, you can still use hard resets to modify it, but doing so also involves using a rebase. Here are the steps that you can use to amend a commit further back in history:

# Create a new branch at the commit you want to amend
git checkout -b temp <commit>

# Amend the commit
git rm <file>
git commit --amend --no-edit

# Rebase your previous branch onto this new commit, starting from the old-commit
git rebase --rebase-merges --onto temp <old-commit> master

# Verify your changes
git diff master@{1}

Solution 3: Non-interactive Rebase

This will work if you just want to remove a commit from history entirely:

# Create a new branch at the parent-commit of the commit that you want to remove
git branch temp <parent-commit>

# Rebase onto the parent-commit, starting from the commit-to-remove
git rebase --rebase-merges --onto temp <commit-to-remove> master

# Or use `-r` instead of the longer `--rebase-merges`
git rebase -r --onto temp <commit-to-remove> master

# Verify your changes
git diff master@{1}

Solution 4: Interactive Rebases

This solution will allow you to accomplish the same things as solutions #2 and #3, i.e. modify or remove commits further back in history than your immediately previous commit, so which solution you choose to use is sort of up to you. Interactive rebases are not well-suited to rebasing hundreds of commits, for performance reasons, so I would use non-interactive rebases or the filter branch solution (see below) in those sort of situations.

To begin the interactive rebase, use the following:

git rebase --interactive <commit-to-amend-or-remove>~

# Or `-i` instead of the longer `--interactive`
git rebase -i <commit-to-amend-or-remove>~

This will cause git to rewind the commit history back to the parent of the commit that you want to modify or remove. It will then present you a list of the rewound commits in reverse order in whatever editor git is set to use (this is Vim by default):

pick 00ddaac Add symlinks for executables
pick 03fa071 Set `push.default` to `simple`
pick 7668f34 Modify Bash config to use Homebrew recommended PATH
pick 475593a Add global .gitignore file for OS X
pick 1b7f496 Add alias for Dr Java to Bash config (OS X)

The commit that you want to modify or remove will be at the top of this list. To remove it, simply delete its line in the list. Otherwise, replace "pick" with "edit" on the 1st line, like so:

edit 00ddaac Add symlinks for executables
pick 03fa071 Set `push.default` to `simple`

Next, enter git rebase --continue. If you chose to remove the commit entirely, then that it all you need to do (other than verification, see final step for this solution). If, on the other hand, you wanted to modify the commit, then git will reapply the commit and then pause the rebase.

Stopped at 00ddaacab0a85d9989217dd9fe9e1b317ed069ac... Add symlinks
You can amend the commit now, with

        git commit --amend

Once you are satisfied with your changes, run

        git rebase --continue

At this point, you can remove the file and amend the commit, then continue the rebase:

git rm <file>
git commit --amend --no-edit
git rebase --continue

That's it. As a final step, whether you modified the commit or removed it completely, it's always a good idea to verify that no other unexpected changes were made to your branch by diffing it with its state before the rebase:

git diff master@{1}

Solution 5: Filtering Branches

Finally, this solution is best if you want to completely wipe out all traces of a file's existence from history, and none of the other solutions are quite up to the task.

git filter-branch --index-filter \
'git rm --cached --ignore-unmatch <file>'

That will remove <file> from all commits, starting from the root commit. If instead you just want to rewrite the commit range HEAD~5..HEAD, then you can pass that as an additional argument to filter-branch, as pointed out in this answer:

git filter-branch --index-filter \
'git rm --cached --ignore-unmatch <file>' HEAD~5..HEAD

Again, after the filter-branch is complete, it's usually a good idea to verify that there are no other unexpected changes by diffing your branch with its previous state before the filtering operation:

git diff master@{1}

Filter-Branch Alternative: BFG Repo Cleaner

I've heard that the BFG Repo Cleaner tool runs faster than git filter-branch, so you might want to check that out as an option too. It's even mentioned officially in the filter-branch documentation as a viable alternative:

git-filter-branch allows you to make complex shell-scripted rewrites of your Git history, but you probably don’t need this flexibility if you’re simply removing unwanted data like large files or passwords. For those operations you may want to consider The BFG Repo-Cleaner, a JVM-based alternative to git-filter-branch, typically at least 10-50x faster for those use-cases, and with quite different characteristics:

  • Any particular version of a file is cleaned exactly once. The BFG, unlike git-filter-branch, does not give you the opportunity to handle a file differently based on where or when it was committed within your history. This constraint gives the core performance benefit of The BFG, and is well-suited to the task of cleansing bad data - you don’t care where the bad data is, you just want it gone.

  • By default The BFG takes full advantage of multi-core machines, cleansing commit file-trees in parallel. git-filter-branch cleans commits sequentially (ie in a single-threaded manner), though it is possible to write filters that include their own parallellism, in the scripts executed against each commit.

  • The command options are much more restrictive than git-filter branch, and dedicated just to the tasks of removing unwanted data- e.g: --strip-blobs-bigger-than 1M.

Additional Resources

  1. Pro Git § 6.4 Git Tools - Rewriting History.
  2. git-filter-branch(1) Manual Page.
  3. git-commit(1) Manual Page.
  4. git-reset(1) Manual Page.
  5. git-rebase(1) Manual Page.
  6. The BFG Repo Cleaner (see also this answer from the creator himself).
Taite answered 20/4, 2014 at 23:10 Comment(4)
Does filter-branch cause recalculating of hashes? If a team works with a repo where a big file should be filtered, how do they do this so that everybody ends up with the same state of the repo?Dyestuff
@YakovL. Everything recalculates hashes. Actually commits are immutable. It creates an entirely new history, and moves your branch pointer to it. The only way to ensure everyone has the same history is a hard reset.Aurelie
NB! The old --preserve-merges is replaced with --rebase-merges some time ago.Eleanoraeleanore
@Eleanoraeleanore Maybe you could offer this as an edit to this answer?Mosqueda
S
122

If you haven't committed anything since, just git rm the file and git commit --amend.

If you have

git filter-branch \
--index-filter 'git rm --cached --ignore-unmatch path/to/file/filename.orig' merge-point..HEAD

will go through each change from merge-point to HEAD, delete filename.orig and rewrite the change. Using --ignore-unmatch means the command won't fail if for some reason filename.orig is missing from a change. That's the recommended way from the Examples section in the git-filter-branch man page.

Note for Windows users: The file path must use forward slashes

Sweeten answered 14/3, 2009 at 20:44 Comment(6)
Thanks! git filter-branch worked for me where the rebase example given as an answer didn't: The steps seemed to work, but then pushing failed. Did a pull, then pushed successfully, but the file was still around. Tried to redo the rebase steps and then it went all messy with merge conflicts. I used a slightly different filter-branch command though, the "An Improved Method" one given here: github.com/guides/completely-remove-a-file-from-all-revisions git filter-branch -f --index-filter 'git update-index --remove filename' <introduction-revision-sha1>..HEADSirius
I'm not sure which one is the improved method. Git official documentation of git-filter-branch seem to give the first one.Minorite
Check out zyxware.com/articles/4027/… I find it the most complete and straight forward solution that involves filter-branchAdjudicate
Thanks! I would suggest just one improvement: --prune-empty; this option to git filter-branch will remove any commits that only touched the file you want to remove (since after removal, they would just be empty commits)Calculous
@atomicules, if you will try to push the local repo to the remote one, git will insist on pulling from the remote first, because it has changes that you don't have locally. You can use --force flag to push to the remote - it will remove the files from there entirely. But be careful tho, make sure you won't force overwrite something other than the files only.Homesick
Remember to use " and not ' when using Windows, or you'll get an unhelpfully phrased "bad revision" error.Afferent
H
50

This is the best way:
http://github.com/guides/completely-remove-a-file-from-all-revisions

Just be sure to backup the copies of the files first.

EDIT

The edit by Neon got unfortunately rejected during review.
See Neons post below, it might contain useful information!


E.g. to remove all *.gz files accidentally committed into git repository:

$ du -sh .git ==> e.g. 100M
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch *.gz' HEAD
$ git push origin master --force
$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --prune=now
$ git gc --aggressive --prune=now

That still didn't work for me? (I am currently at git version 1.7.6.1)

$ du -sh .git ==> e.g. 100M

Not sure why, since I only had ONE master branch. Anyways, I finally got my git repo truely cleaned up by pushing into a new empty and bare git repository, e.g.

$ git init --bare /path/to/newcleanrepo.git
$ git push /path/to/newcleanrepo.git master
$ du -sh /path/to/newcleanrepo.git ==> e.g. 5M 

(yes!)

Then I clone that to a new directory and moved over it's .git folder into this one. e.g.

$ mv .git ../large_dot_git
$ git clone /path/to/newcleanrepo.git ../tmpdir
$ mv ../tmpdir/.git .
$ du -sh .git ==> e.g. 5M 

(yeah! finally cleaned up!)

After verifying that all is well, then you can delete the ../large_dot_git and ../tmpdir directories (maybe in a couple weeks or month from now, just in case...)

Harrovian answered 4/2, 2010 at 5:52 Comment(2)
This worked for me before the "That still didn't work for me?" commentOhl
Great answer, but suggest adding --prune-empty to filter-branch command.Quieten
C
27

Rewriting Git history demands changing all the affected commit ids, and so everyone who's working on the project will need to delete their old copies of the repo, and do a fresh clone after you've cleaned the history. The more people it inconveniences, the more you need a good reason to do it - your superfluous file isn't really causing a problem, but if only you are working on the project, you might as well clean up the Git history if you want to!

To make it as easy as possible, I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for removing files from Git history. One way in which it makes your life easier here is that it actually handles all refs by default (all tags, branches, etc) but it's also 10 - 50x faster.

You should carefully follow the steps here: http://rtyley.github.com/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG jar (requires Java 6 or above) and run this command:

$ java -jar bfg.jar --delete-files filename.orig my-repo.git

Your entire repository history will be scanned, and any file named filename.orig (that's not in your latest commit) will be removed. This is considerably easier than using git-filter-branch to do the same thing!

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Copolymer answered 31/3, 2013 at 12:35 Comment(2)
This is an excellent tool: a single command, it produces very clear output and provides a log file that matches every old commit to the new one. I don't like installing Java but this is worth it.Limit
This is the only thing that worked for me but that's like because I wasn't working git filter-branch correctly. :-)Darnel
V
17

You should probably clone your repository first.

Remove your file from all branches history:

git filter-branch --tree-filter 'rm -f filename.orig' -- --all

Remove your file just from the current branch:

git filter-branch --tree-filter 'rm -f filename.orig' -- --HEAD    

Lastly you should run to remove empty commits:

git filter-branch -f --prune-empty -- --all
Vulgarism answered 10/6, 2016 at 6:35 Comment(1)
While all of the answers seem to be on the filter-branch track, this one highlights how to clean ALL branches in your history.Caesura
M
4

Just to add that to Charles Bailey's solution, I just used a git rebase -i to remove unwanted files from an earlier commit and it worked like a charm. The steps:

# Pick your commit with 'e'
$ git rebase -i

# Perform as many removes as necessary
$ git rm project/code/file.txt

# amend the commit
$ git commit --amend

# continue with rebase
$ git rebase --continue
Manchineel answered 16/10, 2013 at 13:10 Comment(0)
O
4

The simplest way I found was suggested by leontalbot (as a comment), which is a post published by Anoopjohn. I think its worth its own space as an answer:

(I converted it to a bash script)

#!/bin/bash
if [[ $1 == "" ]]; then
    echo "Usage: $0 FILE_OR_DIR [remote]";
    echo "FILE_OR_DIR: the file or directory you want to remove from history"
    echo "if 'remote' argument is set, it will also push to remote repository."
    exit;
fi
FOLDERNAME_OR_FILENAME=$1;

#The important part starts here: ------------------------

git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch $FOLDERNAME_OR_FILENAME" -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

if [[ $2 == "remote" ]]; then
    git push --all --force
fi
echo "Done."

All credits goes to Annopjohn, and to leontalbot for pointing it out.

NOTE

Be aware that the script doesn't include validations, so be sure you don't make mistakes and that you have a backup in case something goes wrong. It worked for me, but it may not work in your situation. USE IT WITH CAUTION (follow the link if you want to know what is going on).

Oswell answered 17/5, 2016 at 2:26 Comment(0)
G
3

Definitely, git filter-branch is the way to go.

Sadly, this will not suffice to completely remove filename.orig from your repo, as it can be still be referenced by tags, reflog entries, remotes and so on.

I recommend removing all these references as well, and then calling the garbage collector. You can use the git forget-blob script from this website to do all this in one step.

git forget-blob filename.orig

Grout answered 30/1, 2017 at 12:54 Comment(1)
"is the way to go" - It no longer is - even the docu says you should use git filter-repo insteadAdlay
S
1

If it's the latest commit you want to clean up, I tried with git version 2.14.3 (Apple Git-98):

touch empty
git init
git add empty
git commit -m init

# 92K   .git
du -hs .git

dd if=/dev/random of=./random bs=1m count=5
git add random
git commit -m mistake

# 5.1M  .git
du -hs .git

git reset --hard HEAD^
git reflog expire --expire=now --all
git gc --prune=now

# 92K   .git
du -hs .git
Sanguinolent answered 29/3, 2018 at 15:40 Comment(2)
git reflog expire --expire=now --all; git gc --prune=now is a very bad thing to do. Unless you're running out of disk space, let git garbage collect these commits after a few weeksConundrum
Thanks for pointing that out. My repo was submitted with many large binary files and the repo is backed up entirely every night. So I just wanted every bit out of it ;)Sanguinolent
D
0

This is what git filter-branch was designed for.

Denunciation answered 21/11, 2008 at 10:26 Comment(0)
B
-2

You can also use:

git reset HEAD file/path

Burstone answered 3/9, 2009 at 4:0 Comment(1)
If the file has been added to a commit then this doesn't even remove the file from the index, it just resets the index to the HEAD version of the file.Buchanan

© 2022 - 2024 — McMap. All rights reserved.