How do I remove the old history from a git repository?
Asked Answered
P

13

291

I'm afraid I couldn't find anything quite like this particular scenario.

I have a git repository with a lot of history: 500+ branches, 500+ tags, going back to mid-2007. It contains ~19,500 commits. We'd like to remove all of the history before Jan 1, 2010, to make it smaller and easier to deal with (we would keep a complete copy of the history in an archive repository).

I know the commit that I want to have become the root of the new repository. I can't, however, figure out the correct git mojo to truncate the repo to start with that commit. I'm guessing some variant of

git filter-branch

involving grafts would be necessary; it might also be necessary to treat each of the 200+ branches we want to keep separately and then patch the repo back together (something I do know how to do).

Has anyone ever done something like this? I've got git 1.7.2.3 if that matters.

Pace answered 23/12, 2010 at 3:7 Comment(0)
E
131

Note: this has been deprecated in favor of git replace.

You can create a graft of the parent of your new root commit to no parent (or to an empty commit, e.g. the real root commit of your repository). E.g. echo "<NEW-ROOT-SHA1>" > .git/info/grafts

After creating the graft, it takes effect right away; you should be able to look at git log and see that the unwanted old commits have gone away:

$ echo 4a46bc886318679d8b15e05aea40b83ff6c3bd47 > .git/info/grafts
$ git log --decorate | tail --lines=11
commit cb3da2d4d8c3378919844b29e815bfd5fdc0210c
Author: Your Name <[email protected]>
Date:   Fri May 24 14:04:10 2013 +0200

    Another message
 
commit 4a46bc886318679d8b15e05aea40b83ff6c3bd47 (grafted)
Author: Your Name <[email protected]>
Date:   Thu May 23 22:27:48 2013 +0200

    Some message

If all looks as intended, you can utilize git filter-branch -- --all to make it permanent.

BEWARE: after doing the filter-branch step, all commit ids will have changed, so anybody using the old repo must never merge with anyone using the new repo.

Eldwen answered 5/2, 2011 at 19:46 Comment(15)
Well, after creating a '.git/info/grafts' file and filter-branch, I still needed a 'git clone --no-local --no-hardlinks' copy (make all your local tracking branches before that). Simply removing '.git/info/grafts' does not do the trick!Insomuch
You probably want to cross-check stackoverflow.com/questions/7654822/… when you want to shrink your repository size.Insomuch
I had to do git filter-branch --tag-name-filter cat -- --all to update tags. But I've also got older tags pointing to the old history that I want to delete. How can I get rid of all those old tags? If I don't delete them, then the older history doesn't disappear and I can still see it with gitk --all.Zepeda
"Just create a graft of the parent of your new root commit to no parent" needs some elaboration. I tried that and failed to figure out the syntax for "no parent". Manual page claims a parent commit ID is required; using all zeroes just gives me an error.Tarrah
In case anyone else was wondering how exactly it works, it's pretty easy: echo "<NEW-ROOT-HASH>" > .git/info/graftsTodd
Can someone explain what this means? "Just create a graft of the parent of your new root commit to no parent (or to an empty commit, eg. the real root commit of your repo)."Somewhere
I agree, explaining what a graft is would be more than usefulLadawnladd
Didn't work for me. Created a mess in history with both old and new commit IDs.Pikeman
This doesn't seem to actually remove old commits; they can bee seen in git-log and checked out.Foretopmast
The force option actually deleted the branch for me. git filter-branch -f -- --allKampong
Quoted from the linked wiki page on grafts. "As of Git 1.6.5, the more flexible git replace has been added, which allows you to replace any object with any other object, and tracks the associations via refs which can be pushed and pulled between repos." So this answer might be out of date for current versions of git.Canberra
Does this method disassociate previous tags with commits? It seemed to scramble some tags for me...Beauvais
Does not work. git log after creating .git/info/grafts still shows initial commmit.Ascend
This definitely does NOT work any longer $ git replace --convert-graft-file hint: Support for <GIT_DIR>/info/grafts is deprecated hint: and will be removed in a future Git version. hint: hint: Please use "git replace --convert-graft-file" hint: to convert the grafts into replace refs. hint: hint: Turn this message off by running hint: "git config advice.graftFileDeprecated false" And it does not appear git replace --convert-graft-file has the desired effect either.Night
Not working as of 2022. There is an anwser below that works without issues: https://mcmap.net/q/14799/-how-do-i-remove-the-old-history-from-a-git-repositoryPilch
S
232

If you want to free some space in your git repo, but do not want to rebuild all your commits (rebase or graft), and still be able to push/pull/merge from people who has the full repo, you may use the git clone shallow clone (--depth parameter).

; Clone the original repo into limitedRepo
git clone file:///path_to/originalRepo limitedRepo --depth=10

; Remove the original repo, to free up some space
rm -rf originalRepo
cd limitedRepo
git remote rm origin

You may be able to shallow your existing repo, by following these steps:

; Shallow to last 5 commits
git rev-parse HEAD~5 > .git/shallow

; Manually remove all other branches, tags and remotes that refers to old commits

; Prune unreachable objects
git fsck --unreachable ; Will show you the list of what will be deleted
git gc --prune=now     ; Will actually delete your data

How to remove all git local tags?

Ps: Older versions of git didn't support clone/push/pull from/to shallow repos.

Skycap answered 16/1, 2016 at 16:51 Comment(15)
+1 This is the correct answer for newer versions of Git. (Oh, and please come back to PPCG!)Garnierite
It looks like you need at least git 1.9 for this to work. I'm not sure the exact version though, because I just went to 2.8 and it worked like a charm.Zabaglione
@Trogdor The answer should say cd limitedRepo since that is where you need to remove the reference to a non-existence origin. I've submitted an edit.Nickelodeon
When I try to push this shallow clone to a new repo (which I want to do because I want to get rid of my repo's history and start a new repo with a much smaller history) I get an error from Gitlab that a shallow update is not allowed. There needs to be a way to turn a shallow clone into a normal repo without restoring all of the extra history again.Mycorrhiza
@Mycorrhiza That would be the other top voted answer. This answer isn't for you if you want to permanently get rid of the history. It's for working with huge histories.Spectator
What if you want to keep a few hundred commits out of thousands? Calculating the depth can become tricky. I like the clone approach, but is there a way to target an old commit hash as initial instead of a depth number?Ceyx
To answer my own question: git clone file:///Users/me/Projects/myProject myClonedProject --shallow-since=2016-09-02 Works like a charm!Ceyx
@Mycorrhiza you can convert your shallow repo into normal one by running git filter-branch -- --all. This will change all hashes in it but after that you will be able to push it to a new repoMousetrap
@Mycorrhiza Set receive.shallowupdate option for a new repo to be able to push shallow clone to it: https://mcmap.net/q/20049/-shallow-update-not-allowed-git-gt-1-9Foldboat
This is a nice solution. All changes remain in history, though. To remove that use e.g. this script: gist.github.com/ymollard/3f642ebda433a7cb8bd5Emigration
fatal: Server does not support --shallow-since :(Leighton
If you want to truncate history based on a specified date , you can use the option --shallow-since=<date> to "Create a shallow clone with a history after the specified time.", in place of --depth <depth> which "Creates a shallow clone with a history truncated to the specified number of commits."Sipes
You may need the --no-single-branch option. Otherwise you lose all your other branches.Manifestation
If you're reading this & running from Windows bash, format file path with drive letter included --> file:///c/Users/Marc\ Ochsner/...Brueghel
find common ancestor of all branches: git merge-base $(git branch --format "%(refname)")Bink
E
131

Note: this has been deprecated in favor of git replace.

You can create a graft of the parent of your new root commit to no parent (or to an empty commit, e.g. the real root commit of your repository). E.g. echo "<NEW-ROOT-SHA1>" > .git/info/grafts

After creating the graft, it takes effect right away; you should be able to look at git log and see that the unwanted old commits have gone away:

$ echo 4a46bc886318679d8b15e05aea40b83ff6c3bd47 > .git/info/grafts
$ git log --decorate | tail --lines=11
commit cb3da2d4d8c3378919844b29e815bfd5fdc0210c
Author: Your Name <[email protected]>
Date:   Fri May 24 14:04:10 2013 +0200

    Another message
 
commit 4a46bc886318679d8b15e05aea40b83ff6c3bd47 (grafted)
Author: Your Name <[email protected]>
Date:   Thu May 23 22:27:48 2013 +0200

    Some message

If all looks as intended, you can utilize git filter-branch -- --all to make it permanent.

BEWARE: after doing the filter-branch step, all commit ids will have changed, so anybody using the old repo must never merge with anyone using the new repo.

Eldwen answered 5/2, 2011 at 19:46 Comment(15)
Well, after creating a '.git/info/grafts' file and filter-branch, I still needed a 'git clone --no-local --no-hardlinks' copy (make all your local tracking branches before that). Simply removing '.git/info/grafts' does not do the trick!Insomuch
You probably want to cross-check stackoverflow.com/questions/7654822/… when you want to shrink your repository size.Insomuch
I had to do git filter-branch --tag-name-filter cat -- --all to update tags. But I've also got older tags pointing to the old history that I want to delete. How can I get rid of all those old tags? If I don't delete them, then the older history doesn't disappear and I can still see it with gitk --all.Zepeda
"Just create a graft of the parent of your new root commit to no parent" needs some elaboration. I tried that and failed to figure out the syntax for "no parent". Manual page claims a parent commit ID is required; using all zeroes just gives me an error.Tarrah
In case anyone else was wondering how exactly it works, it's pretty easy: echo "<NEW-ROOT-HASH>" > .git/info/graftsTodd
Can someone explain what this means? "Just create a graft of the parent of your new root commit to no parent (or to an empty commit, eg. the real root commit of your repo)."Somewhere
I agree, explaining what a graft is would be more than usefulLadawnladd
Didn't work for me. Created a mess in history with both old and new commit IDs.Pikeman
This doesn't seem to actually remove old commits; they can bee seen in git-log and checked out.Foretopmast
The force option actually deleted the branch for me. git filter-branch -f -- --allKampong
Quoted from the linked wiki page on grafts. "As of Git 1.6.5, the more flexible git replace has been added, which allows you to replace any object with any other object, and tracks the associations via refs which can be pushed and pulled between repos." So this answer might be out of date for current versions of git.Canberra
Does this method disassociate previous tags with commits? It seemed to scramble some tags for me...Beauvais
Does not work. git log after creating .git/info/grafts still shows initial commmit.Ascend
This definitely does NOT work any longer $ git replace --convert-graft-file hint: Support for <GIT_DIR>/info/grafts is deprecated hint: and will be removed in a future Git version. hint: hint: Please use "git replace --convert-graft-file" hint: to convert the grafts into replace refs. hint: hint: Turn this message off by running hint: "git config advice.graftFileDeprecated false" And it does not appear git replace --convert-graft-file has the desired effect either.Night
Not working as of 2022. There is an anwser below that works without issues: https://mcmap.net/q/14799/-how-do-i-remove-the-old-history-from-a-git-repositoryPilch
L
87

This answer uses git rebase and hence will potentially have git conflicts that would have to be re-resolved. This method is easy to understand and works fine. The argument to the script ($1) is a reference (tag, hash, ...) to the commit starting from which you want to keep your history.

#!/bin/bash
git checkout --orphan temp $1 # create a new branch without parent history
git commit -m "Truncated history" # create a first commit on this branch
git rebase --onto temp $1 master # now rebase the part of master branch that we want to keep onto this branch
git branch -D temp # delete the temp branch

# The following 2 commands are optional - they keep your git repo in good shape.
git prune --progress # delete all the objects w/o references
git gc --aggressive # aggressively collect garbage; may take a lot of time on large repos

NOTE that old tags will still remain present; so you might need to remove them manually

remark: I know this is almost the same aswer as @yoyodin, but there are some important extra commands and informations here. I tried to edit the answer, but since it is a substantial change to @yoyodin's answer, my edit was rejected, so here's the information!

Lubricious answered 21/5, 2014 at 15:41 Comment(20)
I appreciate the explanations given for the git prune and git gc commands. Is there an explanation for the rest of the commands in the script? As it stands, it is not clear what arguments are being passed to it and what each command is doing. Thanks.Colonist
@Colonist thanks for your remark, I added some more comments for each command. Hope this helps.Lubricious
@ChrisMaes is git prune --progress for an older version of git? Per the docs, "In most cases, users will not need to call git prune directly, but should instead call git gc, which handles pruning along with many other housekeeping tasks."Gaudery
@ypcrumble. I don't know the exact history of those features... But note that the last commands are optional. Git GC should run automatically after a while...Lubricious
Merge conflicts all over the place... not very usefullLance
@Warpzit: at what step did you encounter those merge conflicts? It is quite strange...Lubricious
@ChrisMaes at the rebase step. Kinda frustrating as this seems like the easiest solution.Lance
are you sure that $1 is a direct ancestor of the commit you were on when you started the script? $1 should be on the master branch (and supposing you want to purge the master branch)? Did none of the previous steps give you errors?Lubricious
@ChrisMaes I'm sure it was master, I changed to a point further ahead but same issue!Lance
Let us continue this discussion in chat.Lubricious
@Warpzit, did you ever manage to find out why the conflict happened? I am also experiencing merge conflicts when rebasing..Aground
@Aground Nope, but we removed old large files from history and other tweaks instead of removing whole history. Also the biggest issue was the build server which we changed to use shallow clone.Lance
For reference, this was very slow (slower than the solution with graft/filter-branch), and the procedure kept failing because it needed around 60 GB of disk space, which I didn't have. This solution may work for smaller repositories, though.Informant
After several attempts, I always get merge conflicts during rebase. Removing all tags did not seem to help. git version: 2.19. Does anyone know why merge conflicts even occur?Complot
@ScottWiedemann the merge conflicts might arise if you have a complicated history with merges. The conflicts probably arise when doing git rebase --onto temp $1 masterLubricious
@Lance I got rid of merge conflicts by adding -p to the rebase command, as suggested in other answerCarlie
I followed this exactly, and all I got was the same history as before with a new branch starting at the commit I wanted to prune to with all the same history as before. No history was removed.Annamarieannamese
I got conflicts too. But a force push (git push -f) seems to deal with the problem.Azriel
@Azriel this is completely normal, since the history was rewritten.Lubricious
too many conflicts when doing the rebase. git rebase -p does not solve it for meTransonic
I
62

This answer uses git rebase and hence will potentially have git conflicts that would have to be re-resolved. Try this method How to truncate git history:

#!/bin/bash
git checkout --orphan temp $1
git commit -m "Truncated history"
git rebase --onto temp $1 master
git branch -D temp

Here $1 is SHA-1 of the commit you want to keep and the script will create new branch that contains all commits between $1 and master and all the older history is dropped. Note that this simple script assumes that you do not have existing branch called temp. Also note that this script does not clear the git data for old history. Run git gc --prune=all && git repack -a -f -F -d after you've verified that you truly want to lose all history. You may also need rebase --preserve-merges but be warned that the git implementation of that feature is not perfect. Inspect the results manually if you use that.

Ichthyolite answered 25/7, 2011 at 11:17 Comment(10)
Works for me, except I had to work around the lack of "git checkout --orphan" on my version of git: bogdan.org.ua/2011/03/28/…Allen
I tried this, but got merge conflicts in the rebase step. Strange--I wasn't expecting that merge conflicts could be possible in these circumstances.Zepeda
Use git commit --allow-empty -m "Truncate history" if the commit you checked out does not contain any files.Todd
How do I push this back to the remote master? When I do that I end up with both old and new history.Pikeman
What is 'temp' supposed to be? What are you supposed to pass as an argument for this? Is there an example of what these commands are supposed to look like when you actually run them? Thanks.Colonist
I believe $1 is the commit hash. (There are more details provided in the linked article).Ante
This was by far te easiest solution, and it's not necessary to put it on a bash fileNystagmus
@CraigMcQueen try using git rebase -p --onto temp $1 master (with the -p). That preserves merge commits and should avoid merge conflicts. Otherwise rebase tries to flatten merge commits.Jorge
This answer is incredibly useful; big huge thanks to Mikko.Redbird
I get $ git rebase --onto temp $1 master First, rewinding head to replay your work on top of it... Fast-forwarded temp to temp. $ git branch -D temp error: Cannot delete branch 'temp' checked out at '/home/user/Public/newspapa' Ralleigh
L
38

As an alternative to rewriting history, consider using git replace as in this article from the Pro Git book. The example discussed involves replacing a parent commit to simulate the beginning of a tree, while still keeping the full history as a separate branch for safekeeping.

Lenoralenore answered 26/10, 2012 at 19:17 Comment(7)
Yes, I think you could probably do what we wanted with that, if you nuked the separate full history branch as well. (We were trying to shrink the repository.)Pace
I was discouraged by the answer being off-site; but it does link to the GitScm site and the tutorial that it links to is very well written and seems directly to the point of the OP's question.Canberra
@Canberra Sorry about that! I'll develop the answer a little more fully on-siteLenoralenore
Unfortunately this is not an alternative to rewriting history. There is a confusing sentence in the beginning of the article that probably gave this impression. Could that be removed from this answer? You'll see in the article that the author does rewrite the history of the truncated branch, but proposes a way of reattaching the legacy "history" branch using git replace. I believe this was corrected on another question where you posted this answer.Anorthic
A discussion of git replace versus git graft is made at https://mcmap.net/q/13240/-how-do-git-grafts-and-replace-differ-are-grafts-now-deprecated/873282Kernel
I've had to read quite a bit to understand how to shrink my repository, indeed, it turns out that git replace is the way to go, please consider reading stackoverflow.com/questions/6800692/… I've done it through git replace and it works just fine.Unattended
@JoelAZEMAR does this preserve the remaining history on all branches?Colettecoleus
K
35

If you want to keep the upstream repository with full history, but local smaller checkouts, do a shallow clone with git clone --depth=1 [repo].

After pushing a commit, you can do

  1. git fetch --depth=1 to prune the old commits. This makes the old commits and their objects unreachable.
  2. git reflog expire --expire-unreachable=now --all. To expire all old commits and their objects
  3. git gc --aggressive --prune=all to remove the old objects

See also How to remove local git history after a commit?.

Note that you cannot push this "shallow" repository to somewhere else: "shallow update not allowed". See Remote rejected (shallow update not allowed) after changing Git remote URL. If you want to to that, you have to stick with grafting.

Kernel answered 8/5, 2016 at 22:21 Comment(1)
Point number 1. made the difference for me. CheersZootechnics
S
27

I needed to read several answers and some other info to understand what I was doing.

1. Ignore everything older than a certain commit

The file .git/info/grafts can define fake parents for a commit. A line with just a commit id, says that the commit doesn't have a parent. If we wanted to say that we care only about the last 2000 commits, we can type:

git rev-parse HEAD~2000 > .git/info/grafts

git rev-parse gives us the commit id of the 2000th parent of the current commit. The above command will overwrite the grafts file if present. Check if it's there first.

2. Rewrite the Git history (optional)

If you want to make this grafted fake parent a real one, then run:

git filter-branch -- --all

It will change all commit ids. Every copy of this repository needs to be updated forcefully.

3. Clean up disk space

I didn't done step 2, because I wanted my copy to stay compatible with the upstream. I just wanted to save some disk space. In order to forget all the old commits:

git prune
git gc

Alternative: shallow copies

If you have a shallow copy of another repository and just want to save some disk space, you can update .git/shallow. But be careful that nothing is pointing at a commit from before. So you could run something like this:

git fetch --prune
git rev-parse HEAD~2000 > .git/shallow
git prune
git gc

The entry in shallow works like a graft. But be careful not to use grafts and shallow at the same time. At least, don't have the same entries in there, it will fail.

If you still have some old references (tags, branches, remote heads) that point to older commits, they won't be cleaned up and you won't save more disk space.

Sejant answered 1/5, 2017 at 6:33 Comment(2)
Support for <GIT_DIR>/info/grafts is deprecated and will be removed in a future Git version.Chopper
Please consider using git replace instead. See stackoverflow.com/questions/6800692/…Unattended
S
6

There are too many answers here which are not current and some don't fully explain the consequences. Here's what worked for me for trimming down the history using latest git 2.26:

First create a dummy commit. This commit will appear as the first commit in your truncated repo. You need this because this commit will hold all base files for the history you are keeping. The SHA is the ID of the previous commit of the commit you want to keep (in this example, 8365366). The string 'Initial' will show up as commit message of the first commit. If you are using Windows, type below command from Git Bash command prompt.

# 8365366 is id of parent commit after which you want to preserve history
echo 'Initial' | git commit-tree 8365366^{tree}

Above command will print SHA, for example, d10f7503bc1ec9d367da15b540887730db862023.

Now just type:

# d10f750 is commit ID from previous command
git rebase --onto d10f750 8365366

This will first put all files as-of commit 8365366 in to the dummy commit d10f750. Then it will play back all commits after 8365366 over the top of d10f750. Finally master branch pointer will be updated to last commit played back.

Now if you want to push these truncated repo, just do git push -f.

Few things to keep in mind (these applies to other methods as well as this one): Tags are not transferred. While commit IDs and timestamps are preserved, you will see GitHub show these commits in lumpsum heading like Commits on XY date.

Fortunately it is possible to keep truncated history as "archive" and later you can join back trimmed repo with archive repo. For doing this, see this guide.

Schnell answered 18/5, 2020 at 9:59 Comment(1)
this answer results in the same warnings when doing the rebase another one in this threadTransonic
P
2

When rebase or push to head/master this error may occurred

remote: GitLab: You are not allowed to access some of the refs!
To git@giturl:main/xyz.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'git@giturl:main/xyz.git'

To resolve this issue in git dashboard should remove master branch from "Protected branches"

enter image description here

then you can run this command

git push -f origin master

or

git rebase --onto temp $1 master
Partheniaparthenocarpy answered 3/1, 2017 at 15:55 Comment(0)
B
1

For existing repository cloned previously with --depth

git clone --depth=1 ...

Just do

git pull --depth=1 --update-shallow

https://git-scm.com/docs/git-pull

Ballard answered 15/9, 2021 at 6:27 Comment(0)
G
0

In my case I want to split a repo in two, keep history but clean up the log history from files filtered out the new repo.

This was the solution:

PATHS=path_a path_b
git filter-branch -f --prune-empty --index-filter "git read-tree --empty                                                                                    
git reset \$GIT_COMMIT -- $PATHS " -- --all -- $PATHS

This way I got a new repo with the full commit log history, but only for the path I wanted to keep;

Ref: https://mcmap.net/q/20050/-how-to-extract-one-file-with-commit-history-from-a-git-repo-with-index-filter-amp-co

Godric answered 20/8, 2022 at 10:6 Comment(0)
S
-3

According to the Git repo of the BFG tool, it "removes large or troublesome blobs as git-filter-branch does, but faster - and is written in Scala".

https://github.com/rtyley/bfg-repo-cleaner

Scyphozoan answered 7/8, 2017 at 14:11 Comment(0)
A
-9
  1. remove git data, rm .git
  2. git init
  3. add a git remote
  4. force push
Aniakudo answered 22/1, 2015 at 5:26 Comment(2)
that will work to remove ALL history, but not for what he asked: keep history since january 2010Lubricious
Just wanted to say thanks as it helped me in my scenario even though this might not be the right answer to the questionDifficulty

© 2022 - 2024 — McMap. All rights reserved.