How to delete the old git history?
Asked Answered
S

5

44

I have git repository with many, many (2000+) commits, for example:

                 l-- m -- n   
                /
a -- b -- c -- d -- e -- f -- g -- h -- i -- j -- k
                     \
                      x -- y -- z

and I want to truncate old log history - delete all commits from log history starting from (for example) commit "f" but as the beginning of repository.

How to do it?

Sherd answered 31/1, 2017 at 8:56 Comment(4)
rebase is the tool for changing history.Conflagration
What is the problem you want to solve?Lack
It sounds to me that he thinks 2000 is too much and wants to solve a problem. Let's not point fingers when it's an assumption =)Identity
be sure also to see #4516080, which has some of the same answers but with additional useful edits & commentsFriary
D
64

In order not to lose some history; better first take a copy of your repository :). Here we go: (<f> is the sha of the commit f that you want to be the new root commit)

git checkout --orphan temp <f>      # checkout to the status of the git repo at commit f; creating a branch named "temp"
git commit -m "new root commit"     # create a new commit that is to be the new root commit
git rebase --onto temp <f> master   # now rebase the part of history from <f> to master onthe temp branch
git branch -D temp                  # we don't need the temp branch anymore

If you have a remote where you want to have the same truncated history; you can use git push -f. Warning this is a dangerous command; don't use this lightly! If you want to be sure that your last version of the code is still the same; you can run git diff origin/master. That should show no changes (since only the history changed; not the content of your files).

git push -f  

The following 2 commands are optional - they keep your git repo in good shape.

git prune --progress                 # delete all the objects w/o references
git gc --aggressive                  # aggressively collect garbage; may take a lot of time on large repos
Douglasdouglashome answered 31/1, 2017 at 9:1 Comment(11)
Sounds what I need but each time I run the third step (git rebase...) I get conflicts. Is it normal?Schiff
no, that doesn't seem normal. Did you include the <f> part (that is; the same commit sha from which you created the temp branch) in that third step?Douglasdouglashome
@ChrisMaes, I get conflicts too. I see from commit messages that 3rd command tries to apply old commits, before <f>Illiterate
I'm getting conflicts as well. This might not work with big repos.Victoir
I am getting conflicts as well..is there any force rebaseLondrina
This answered is copied&pasted all over StackOverflow, being clearly incorrect (because it explicitly generates rebase conflicts by applying old commits on top of <f>).Certification
@gented. Rebasing commits doesn't automatically create merge conflicts. If you have a simple linear history, no conflict should arise (I have actually run this code). When you have merge commits in your history, then things can get complicated and merge conflicts can arise.Douglasdouglashome
@ChrisMaes Of course conflicts don't happen if there are no conflicts, and they happen if there are, that's a tautology :p. Your comment above "no, that doesn't seem normal" is incorrect, because conflicts are exactly what to expect with this method (having files being modified before and after a certain <f>) in almost the totality of practical projects.Certification
@gented. Please read my last comment again, I did not use a tautology. The words "commit" and "conflict" do not mean the same. My other comment could have been better, I agree, and it would sound the same as my last comment: for simple, linear history after commit <f>, no conflicts should arise. When there are merge commits after <f> (especially merge commits with code coming from before <f>), then conflicts will probably arise.Douglasdouglashome
I am getting conflicts as well. My repo does have merge commits after <f> which might be the cause. Is there a solution for such case?Giovannagiovanni
I get conflicts too. This clearly isn't the right way to do it. (I don't know what is though..)Destructible
M
21

A possible solution for your problem is provided by git clone using the --shallow-since option. If there is only a small number of commits since f and there is no trouble counting them then you can use the --depth option.

The second option (--depth) clones only the specified branch. If you need additional branches you can then add the original repo as a remote and use git fetch and to retrieve them.

When you are pleased with the result, remove the old repository and rename the new one to replace it. If the old repository is remote then re-create it after removal and push from the new repo into it.

This approach has the advantage of size and speed. The new repo contains only the commits you want and there is no need to run git prune or git gc to remove the old objects (because they are not there).

Mulct answered 31/1, 2017 at 9:20 Comment(6)
a nice alternative. +1Douglasdouglashome
If you want to keep the history but on the remote only, don't do the last step. For my application, this is the best configuration: I have the bloated history on the remote in the unlikely event I need it, but locally clones and updates are quick and don't take up much disk space.Matias
Advice with re-creation of remote did not work for me: [remote rejected] develop -> develop (shallow update not allowed).Illiterate
I tried to be clever and push the shallow clone into new branch (instead of new origin). But GitHub still remembered the "deleted" history. In other words, I recreated a branch on origin, not a whole origin, and history didn't budge. Why is that? Why do I have to recreate the origin?Fence
@MaximKamalov It depends where your new branch starts from. If it starts from the current master then it inherits the entire history of master. Use a GUI Git client to see the history and the relationship between commits.Mulct
Btw, in the end I used this method: #41953800 It's more convenient in case of GitHub because the origin has Issues attached to it, it's not easy to recreate it.Fence
C
4

For those who get alot of merge conflicts (and broken results) with rebase --onto I'd like recommend this script which uses git filter-branch:

#!/bin/sh

cut_sha="$1"
branch="$2"

git filter-branch \
  --parent-filter "sed -e 's/-p $cut_sha[0-9a-f]*//'" \
  --prune-empty \
  -- $branch

git for-each-ref --format='%(refname)' refs/original | \
  while read ref
  do
    git update-ref -d "$ref"
  done

git reflog expire --expire=0 --all
git repack -ad
git prune

Source: https://github.com/adrienthebo/git-tools/blob/master/git-truncate

Instructions:

  1. Save the script above to local repository root (maybe as git-truncate.sh).
  2. Check out the branch you'd like to truncate (maybe master).
  3. Go down history and find the first (newest) commit SHA you want to cut off (assume it's 2c75a32) AND ensure the commit has no branches in parallel!
  4. Run it like this: $ ./git-truncate.sh 2c75a32 master.
  5. (Push force, if any remote is present.)

IMPORTANT: The SHA must be "part" of the branch and it must be the first commit you want to delete. Don't pass the first commit you want to keep (the new "beginning of repository" commit)!

Cap answered 30/1, 2020 at 17:17 Comment(0)
L
3

Without rebasing, from this link

git checkout --orphan temp  # create a temporary branch
git add -A  # Add all files and commit them
git commit -m 'Add files'
git branch -D master  # Deletes the master branch
git branch -m master  # Rename the current branch to master
git push -f origin master  # Force push master branch to server

Clean up repo:

# Local master tracks origin/master
git branch --set-upstream-to=origin/master master 
git gc --aggressive --prune=all  # remove the old files
Leonoreleonsis answered 13/9, 2023 at 13:41 Comment(1)
I used this and it worked but be aware it wipes out all history.Menticide
B
2

Recall that a Git commit is a snapshot. A snapshot is self-contained; it does not need any information from its parents. Speaking of parents, these snapshots are linked to zero or more parents. And the root commit has no parents.

If you want the repository to start from commit f then you can change that commit (with git-replace(1)) to have no parents.

git replace --graft f

Now f (through its replace reference) will have no parents. [1]

If you want to make this change permanent then you need to rewrite the whole history starting from f. You can do that with the third-party tool git-filter-repo(1). [2]

(Note that the following command (with --force) should be considered to not be something that you can undo—make sure that it is what you want.)

git filter-repo --force

Notes

  1. git(1) respects these replacements by default. Use --no-replace-objects as an argument to git in order to not use them, for example git --no-replace-objects log
  2. The deprecated built-in git-filter-branch(1) mentions this as an alternative
Brotherton answered 22/9, 2023 at 13:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.