What are the practical consequences of rewriting Git history?
Asked Answered
I

2

37

Our project has been using Git for a week or so now, and we're all enjoying it a lot (using it in a tight collaborative group turns out to be quite a different Git experience). To keep things as simple as possible, we do not do any rebasing or history modifications. But we did make a few mistakes in the first week. A few commits were made that shouldn't have been done, and we managed to merge a feature branch into the wrong integration branch (1.1 instead of 1.0). And we didn't find out about these things until they were long into our history.

Now I see a lot of warnings about rewriting history, but I'm not really sure I understand the dangers involved. We use a shared bare repository, and all branches are pushed there for backup.

I would expect that if you rewrite history (say remove a commit), the full list of subsequent commits will "lose" that commit (and maybe not compile/work). I would also expect that if this happens I could actually choose to fix this at the top of history (and just leave that part of history as non-compiling).

  • If I rewrite history (and everything compiles/works in all affected branches), will my co-workers need to do any special commands? (In other words, will they "know that I have done it" if I did it well?)
  • Will any users with local changes that I do not know about be eligible for merge failures on git pull?
  • Am I missing anything essential here?

Any references to articles/tutorials on this subject would also be really nice.

Instantaneous answered 29/9, 2009 at 7:2 Comment(1)
@Goober: Naive response from me here, is that a problem ? We're test driven so I believe we'll catch itInstantaneous
H
25

Required reading is Problems with rewriting history in the Git User's Manual.

If I rewrite history (and everything compiles/works in all affected branches), will my co-workers need to do any special commands (i.e. will they "know that I have done it" if I did it well?)?

They will know, and Git will tell them in no uncertain terms that something is wrong. They will get unexpected error messages, and may in the process of trying to resolve the resulting merge conflicts, inadvertently revert previous commits. This problem creates a real message, and if you're curious to see what happens you can always try it on a temporary copy of your repositories.

Will any users with local changes that I do not know about be eligible for merge failures on git pull ?

Absolutely, see above.

Am I missing anything essential here ?

Avoid rewriting history at (almost) all costs!

Hematuria answered 29/9, 2009 at 7:8 Comment(9)
I had actually read that part of the manual, it feels like I've read all of the manuals RSN ;) Do the rewritten branches always get new identities? That does indeed explain why it should never be done...Instantaneous
Because of the way Git identifies commits by their content and all previous commits, any change (however minor) to a commit will look like a totally new branch of development to Git. There is no way to make a rewritten history look "almost the same".Hematuria
Thanks, strange how just missing a tiny piece of information in git can make all the difference.Instantaneous
That's widely considered to be a feature of Git's repository format. What if that tiny piece of information was critically important? See: keithp.com/blogs/Repository_Formats_Matter for a great post on that topic.Hematuria
Sorry, I'm not being totally clear: I hadn't understood that rewriting histroy has to create a derivate branch. Now that I know I no longer have any unanswered questions about rewriting history ;) I understand that I'm totally pulling the rug on any feature branches created after the point of rewrite.Instantaneous
Now I understand your position. Glad it makes sense now!Hematuria
"Avoid rewriting history at (almost) all costs!" It's been over ten years since you wrote this and I'm wondering if your position has changed. Back then, github was a year old and may not have had pull requests. But now, it's common for people to squash and merge remote branches into trunk. Others rebase their personal branches onto trunk. This usually simplifies git history, which I consider a trade-off. Bear in mind, everything you said is still valid today. I guess I'm just curious about what "(almost)" means.Ogata
@DanielKaplan My answer has nothing to with GitHub or pull requests, the use of those doesn't change anything. The advice remains the same: Avoid rewriting history of any branch that you share with others. You can do whatever you like (rebase, squash, etc) with a branch that other people aren't depending on. Sometimes, like in instances where a secret (password or other sensitive info) gets accidentally comitted to a shared branch, then you have to do something drastic. That's the "(almost)".Hematuria
"My answer has nothing to with GitHub" I brought it up to shed some light as to what the git ecosystem was like at the time of your answer. Git 1.0 was released 3 years prior, and AFAICT, your answer hasn't been edited since; I was just wondering if your opinion had changed since then. FWIW, I interpret, "Avoid rewriting history at (almost) all costs!" and, "Avoid rewriting history of any branch that you share with others" as 2 different statements, the former akin to saying (almost) "never rebase", "never reset to a commit", etc. I probably took it out of contextOgata
W
8

As mentioned in the other answer comments, in practice each commit is unique and rewriting history will make new commits.

You can think of it as cutting off the branches of a tree and then instantly growing new ones. They may even look the same but aren't. Yes, voodoo magic. In this analogy, reverting would be almost like supporting a falling branch with a log, so it will grow its way without falling down.

That leads us to a couple good reasons to rewrite history:

  • Slim down a private repository before going public: for instance, create a new local private branch, test, test, rewrite, push.
  • Remove sensitive data from a private repo before going public.

Those already reveal what Greg already said: rewriting history will potentially screw up everyone if the repository is public (pushed commits). Reason why I also advocate on avoid doing it at all costs even in private repos, just to keep the good habit: and so rewriting history should be avoided at all costs (this means to just give enough consideration before doing it: weight up the pros and cons!)

And there is at least another philosophical and overlooked reason: rewritten history is data lost. True, a git history with revert might look messier than a reset one. But if properly written, all that "mess" can be hidden away in separated branches and still we can see precisely at what point a revert was done. And even with reasons or evidence as to why it was done.

Back to the tree analogy, even if you do remove the supporting log, the reverted branch will show the sinuous growing curves, and it is beautiful!

Warmblooded answered 17/9, 2013 at 20:16 Comment(3)
Note that there is a newer (2nd) edition to the "Rewriting History" chapter. git-scm.com/book/en/v2/Git-Tools-Rewriting-HistoryHomosexuality
I agree on telling people to learn and judge like adults when it's a good idea to do things. Rewriting history, done frequently on feature branches that are not to be worked by others (yet) is a must and mandatory on our company.Corse
@XavierAriasBotargues perhaps that's indeed the most compelling day to day use for rewriting: in practice, it is much easier to just clean stuff up and remove a lot of useless clutter when doing solo development. and very little to no gain in doing the extra mile to keep that history anyway. still... i hope your company doesn't shut down those that will revert. ;PWarmblooded

© 2022 - 2024 — McMap. All rights reserved.