Git Cherry-pick vs Merge Workflow
Asked Answered
M

3

344

Assuming I am the maintainer of a repo, and I want to pull in changes from a contributor, there are a few possible workflows:

  1. I cherry-pick each commit from the remote (in order). In this case git records the commit as unrelated to the remote branch.
  2. I merge the branch, pulling in all changes, and adding a new "conflict" commit (if needed).
  3. I merge each commit from the remote branch individually (again in order), allowing conflicts to be recorded for each commit, instead of grouped all together as one.
  4. For completeness, you could do a rebase (same as cherry-pick option?), however my understanding is that this can cause confusion for the contributor. Maybe that eliminates option 1.

In both cases 2 and 3, git records the branch history of the commits, unlike 1.

What are the pro's and con's between using either cherry-pick or merge methods described? My understanding is that method 2 is the norm, but I feel that resolving a large commit with a single "conflict" merge, is not the cleanest solution.

Meaghan answered 6/8, 2009 at 21:50 Comment(0)
C
342

Both rebase (and cherry-pick) and merge have their advantages and disadvantages. I argue for merge here, but it's worth understanding both. (Look here for an alternate, well-argued answer enumerating cases where rebase is preferred.)

merge is preferred over cherry-pick and rebase for a couple of reasons.

  1. Robustness. The SHA1 identifier of a commit identifies it not just in and of itself but also in relation to all other commits that precede it. This offers you a guarantee that the state of the repository at a given SHA1 is identical across all clones. There is (in theory) no chance that someone has done what looks like the same change but is actually corrupting or hijacking your repository. You can cherry-pick in individual changes and they are likely the same, but you have no guarantee. (As a minor secondary issue the new cherry-picked commits will take up extra space if someone else cherry-picks in the same commit again, as they will both be present in the history even if your working copies end up being identical.)
  2. Ease of use. People tend to understand the merge workflow fairly easily. rebase tends to be considered more advanced. It's best to understand both, but people who do not want to be experts in version control (which in my experience has included many colleagues who are damn good at what they do, but don't want to spend the extra time) have an easier time just merging.

Even with a merge-heavy workflow rebase and cherry-pick are still useful for particular cases:

  1. One downside to merge is cluttered history. rebase prevents a long series of commits from being scattered about in your history, as they would be if you periodically merged in others' changes. That is in fact its main purpose as I use it. What you want to be very careful of, is never to rebase code that you have shared with other repositories. Once a commit is pushed someone else might have committed on top of it, and rebasing will at best cause the kind of duplication discussed above. At worst you can end up with a very confused repository and subtle errors it will take you a long time to ferret out.
  2. cherry-pick is useful for sampling out a small subset of changes from a topic branch you've basically decided to discard, but realized there are a couple of useful pieces on.

As for preferring merging many changes over one: it's just a lot simpler. It can get very tedious to do merges of individual changesets once you start having a lot of them. The merge resolution in git (and in Mercurial, and in Bazaar) is very very good. You won't run into major problems merging even long branches most of the time. I generally merge everything all at once and only if I get a large number of conflicts do I back up and re-run the merge piecemeal. Even then I do it in large chunks. As a very real example I had a colleague who had 3 months worth of changes to merge, and got some 9000 conflicts in 250000 line code-base. What we did to fix is do the merge one month's worth at a time: conflicts do not build up linearly, and doing it in pieces results in far fewer than 9000 conflicts. It was still a lot of work, but not as much as trying to do it one commit at a time.

Cano answered 6/8, 2009 at 22:14 Comment(8)
Actually, in theory there is a chance that Mallory can corrupt your repository by creating commits with the same SHA1 but different content, it just probably won’t ever happen in practice. :)Ravelin
Ha :) I meant "in theory the odds are so low that you can rely on it not happening", but you are right that it reads topsy-turvy.Cano
What do you think about "merge --squash" ?Meaghan
@Ravelin If Mallory wants to be successful she would have to specifically build the original commit and the second commit with the same SHA1. So another question could be: what are the odds of two (somewhat) bogus commits appearing and you not noticing? ;)Cylindroid
Well, assuming you were deliberately trying to create a SHA1 collision, a Stephens' Attack will take about 2^60 SHA1 operations to find a collision. Or about one in 10 Quintillion.Bombacaceous
9000 conflicts? I'd quit my job and become a bee keeper.Songer
It's over 9,000!Archducal
3 months worth of working in isolation of other developers and then merging? Was he on the deserted island with no Internet?Tuque
O
106

In my opinion cherry-picking should be reserved for rare situations where it is required, for example if you did some fix on directly on 'master' branch (trunk, main development branch) and then realized that it should be applied also to 'maint'. You should base workflow either on merge, or on rebase (or "git pull --rebase").

Please remember that cherry-picked or rebased commit is different from the point of view of Git (has different SHA-1 identifier) than the original, so it is different than the commit in remote repository. (Rebase can usually deal with this, as it checks patch id i.e. the changes, not a commit id).

Also in git you can merge many branches at once: so called octopus merge. Note that octopus merge has to succeed without conflicts. Nevertheless it might be useful.

HTH.

Overawe answered 8/8, 2009 at 12:22 Comment(4)
+1 for the point that rebase/cherry-picking actually "copy" the commits and therefore lose the linkage to the original commit.Accusatory
We use cherry-pick in this fashion, exclusively to move commits for bug-fixes (maybe VERY SMALL features) into an existing release branch to prepare a patch. Features that span multiple commits generally warrant going into a release branch that is based off of master.Pung
@foxxtrot: Another solution is to create a separate branch for a bugfix, based on oldest commit that exhibit this bug, and merge it into 'maint' and into 'master'... though in this case you need to know that said bugfix applies to both branches.Highmuckamuck
@Jakub Two commands that are indispensable for creating and merging a bugfix branch: git blame to find the commit that introduced the bug, and git branch --contains to determine where to merge the branch. Described in more detail in this postDuda
T
-18

Rebase and Cherry-pick is the only way you can keep clean commit history. Avoid using merge and avoid creating merge conflict. If you are using gerrit set one project to Merge if necessary and one project to cherry-pick mode and try yourself.

Tropine answered 13/12, 2017 at 6:10 Comment(9)
not clear at all how this answers the question, maybe some examples would bring some light.Acton
The fact that your history would look straight doesn't imply that it would be easier to understand.Belabor
Merging is the usual way to have a clean history. Cherry-pick and rebase is mostly used for situations where you must modify the history. What means that merging should always be the first choice. Cause rebase changed comit sha`s what is very dangerous when you work with remotes and multiple people.Bubal
This guy right here deserves a medal. He knows he'll continue to get down-voted but it's the right answer. Kudos.Braise
Sorry I did not see these comments until now, Please try it out in your test enviroment before concluding and do what works for you! I have about 600 developers contributing to multiple products branches, I don't care what developers do in there local workspace, when a change is submitted for integration it should be cherry-pick able to development branch or sometimes release or bug fix branch. FYI... I use Gerrit.Tropine
So we shouldn't use merge and cherry pick each commit?Canonize
@NagarajMagadum if you don't like the commits your developers are making, you should have them clean up the histories in their own branches using rebase before sending a pull request. using cherry-picking as a normal workflow is just broken.Saddlebag
@Saddlebag by setting cherry-pick as default merge mode I don't worry about what happens in developer workspace. As long as Gerrit can cherry-pick the developer change then we are fine. I have hooks in place to make sure changes coming in for review always rebased to latest HEAD. There are occasions where we have long lead feature branches for them we allow merge but very rare. For large number of developers distributed across multiple timezone then it very very important to make sure commit history is clean and at any give time any PR is revertible if build or tests are brokenTropine
also see #11634378Tropine

© 2022 - 2024 — McMap. All rights reserved.