How do I split up a large Git branch into lots of smaller branches?
Asked Answered
R

6

47

I have imported from SVN into Git, now I have one big branch, like this:

  • work on feature C
  • work on feature B
  • work on feature C
  • work on feature C
  • work on feature B
  • work on feature A

I want separate feature branches, for A, B, C. I'm cherry picking commits to new branches but this doesn't remove them from the original branch so I have to manually track which ones I have pulled out.

There are around 800 commits to split up, and maybe 50 features/bugfixes.

It would be nice to have the ones I have pulled out reflected this way somehow in the git log, so I know which ones I have already done. Is this possible?

I can rebase the entire branch, skipping the commits I have pulled out, but I'm worried this will cause lots of conflicts. I don't want to resolve 500 conflicts every time I pull a commit out.

What's the best method of pulling out commits from one uber branch onto smaller feature branches, whilst keeping track of your progress?

Ray answered 22/9, 2012 at 3:31 Comment(0)
D
52

What I do in this case is use interactive rebase.

At your HEAD, create your branches A, B, and C. Also create a "backup" branch (you could name it backup) in case things go wrong and you need your original HEAD back.

git branch feature-a
git branch feature-b
git branch feature-c
git branch backup-before-rebase

Then, create a branch at the commit you want them to start from, maybe at a convenient stable commit. Call it new_trunk or something.

git checkout HEAD~50       ## this will be the new tree-trunk
git branch new_trunk

Then, do interactive rebases and pick out the commits you want to keep in that branch. Used this way, it's basically like cherry-picking in bulk.

git checkout feature-a
git rebase -i new_trunk    ## -i is for "Interactive"

When you're done, you should have 3 branches with separate histories starting from new_trunk and a backup branch reflecting the old HEAD if you still need it.

Duchamp answered 22/9, 2012 at 8:20 Comment(9)
Also look into rerere to help if you run into the same conflicts again and again.Duchamp
I usually use tags instead of branches, for stuff that is supposed to remain unchanged - especially backups.Trematode
tags are good too - new_trunk is a good candidate for a tag instead of a branchDuchamp
This is a good suggestion, but it doesn't solve the difficult problem I outlined: How to track which commits on the original branch have been cherry picked/rebased onto another branch? One idea is simply to add a tag to every commit after picking it out. Another idea is to pick the commits in order one by one and keep a tag up to date as to where i've got to. I suspect there is a better way, though.Ray
I'm not clear on why this doesn't do that - the "path" from the root branch is reflected in the git tree. Can you elaborate in the question on what forensics you need to do after your reorganization is done?Duchamp
When you rebase, the commits you pick are duplicated onto the new branch, as far as I know, the log does not provide any "equality" visuals to say that commit A == commit X and commit B == commit Y. The only thing it does is share the root commit. For example if I create a feature branch and rebase it, skipping commits 2 and 3 but keeping 1 and 4. The new branch will start from commit 1, but commit 4 will be duplicated and there is no way of seeing visually that the 'new' commit 4 (on the new branch) is the same as the old commit 4.Ray
That's correct, because to git they are not the same. The commit message will be preserved (unless you change it yourself during the rebase). You can also git diff the two commits if you want to check if they really are the same, but I don't know of any clients that do this systematically.Duchamp
This answer seems incomplete - how do I use the interactive rebase? "pick" for the commits I want on the feature branch, and "drop" or delete lines for the commits I don't want on that feature branch? I won't lose any commits as warned in the interactive rebase editor comments?Flowerdeluce
"Also create a "backup" branch (you could name it backup) in case things go wrong and you need your original HEAD back."Duchamp
A
10

Personally I would really consider pros and cons of such large changes (once more if you've already done this). If you run into conflicts (which is in large rebase/cherry-pick annoying and hard-to-solve by itself) you will probably have tough times when merging features back to your "master" branch.

Wouldn't be better/easier to freeze your big-branch, get it "done" (or "good enough") and make new feature-branches on it? (Or exclude only some branches?)

But to your question:

If you want to track changes/missing commits automatically use git cherry command.

git cherry featureBranch bigBranch

If there were no conflicts while cherrypicking or rebasing your feature branch you can use previous code with some additional pipes:

git cherry featureBranch bigBranch | awk '{ print "pick " $2 }' | tee remaining

This will print (and save to file called "remaining") commits missing in featureBranch. You can add this to interactive rebase on bigBranch to throw away commits you don't want anymore. (Maybe you can script it even more with "ed" editor as git editor and passing commands to standard input of interactive rebase but I didn't tried it.)

Armlet answered 22/9, 2012 at 9:6 Comment(2)
Didn't know about git cherry. Great tip.Duchamp
Good point with git cherry. However, can I use this with multiple unmerged feature branches? I don't want to merge all potential feature branches into one big muddled branch simply to compare which commits are remaining. Would I have to compare every feature branch to the original branch separately, and then somehow cross off all commit id's which aren't in every comparison? What about somehow tagging each commit as I cherry pick them out?Ray
D
8

Just to simplify willoller's answer further,

make the feature branches, and backup, in case

git branch feature-a
git branch feature-b
git branch feature-c
git branch backup-before-rebase

then checkout a feature branch and do an interactive rebase from the commit you want them to start from

git checkout feature-a
git rebase -i <safecommit>
enter code here

If you want some feature branches to share some commits to keep your tree clean, don't create the later feature branch at the start, but once you've got a rebased feature branch and then use the shared commit reference as your next safecommit

#on branch feature-a
git checkout -b feature-d
git rebase -i <sharedcommit>
Danaedanaher answered 23/5, 2014 at 21:1 Comment(0)
R
2

Another method I have just found out about, is using "git notes".

http://alblue.bandlem.com/2011/11/git-tip-of-week-git-notes.html

This feature allows adding comments to existing commits without actually changing the branch / requiring a rebase. One method of tracking which commits have been pulled out is to add a git note to each one:

Cherry-picked to features\xyz 925a5239d4fbcf7ad7cd656020793f83275ef45b

This could help in a largely manual process - you could write a little script to cherry pick a commit to a particular branch then add the relevant git note back to the original commit.

Alternatively, if you want to get really funky, you could automate the whole process, by:

  1. Add a git note to every commit, saying which feature branch you want it cherry-picked to: TOCHERRYPICK: features\xyz
  2. Write a script to scan all the git notes, and automatically create all the feature branches and cherry-pick the correct selected commits. It could then change the git note to CHERRYPICKED: features\xxx at 925a5239d4fbcf7ad7cd656020793f83275ef45b to allow the tool to be re-run later to pick out more commits.
  3. If you are really keen to make it prominent when a commit has been cherry picked, you could also automate the creation of a tag with a similar name: CHERRYPICKED:<branch>:SHA
Ray answered 4/11, 2015 at 12:9 Comment(0)
K
2

I honestly wouldn't do this unless you have a huge list of commits that need to split up and they are very independent features, i.e. not altering the same line where there would be conflicts to resolve.

As others have suggested, create a new branch for each feature and use git rebase --interactive to include the desired commits.

To ensure none go astray, create the contents of the git-rebase-todo files by

  • editing a list of all the desired commits and classifying them by feature
  • separating the list of commits into separate files

You can create the list of commits by using a command like

git log --oneline --reverse  44e19^... > log.txt

to display commit 44e19 onwards. This will give you a file like this

44e1936 dSRGratuities (SummaryRecord)
67fedda Receipt Report HEADER: 20! multiply by Paym_FieldCount
69d70e2 Receipt Report: Payment
....

which when edited (to add classification: feature a,b,c etc) might look like my sorted.txt

c 44e1936 dSRGratuities (SummaryRecord)
a 67fedda Receipt Report HEADER: 20! multiply by Paym_FieldCount
b 69d70e2 Receipt Report: Payment
c abea7db Receipt Report: Cashback
a cf96185 Receipt Report: Gratuity
c 70e987a Receipt Report: use amount tendered for printing
a 7722ac8 Receipt Report: use amount tendered for calculations
c 47f1754 Receipt Report: store amount tendered
b b69a73f Receipt Report: Use enum Paym_FieldCount
a 9a0b471 Receipt Report HEADER: enum PaymentEntries (with Paym_FieldCount)
c ad67e79 Use SharpReport enum
b 3c510c6 enum SharpReport
a e470e07 m_Gratuities m_dSSGratuities (SalesSummary)
b 4e0c3e4 m_Gratuities m_szGratuities (SalesSummaryRecord)
b bd054f7 _gx_fn_Cashback

Then script in your favorite scripting language to turn the sorted list into a collection of git-rebase-todo files. Your script might resemble the one I just wrote.

foreachline text sorted.txt {
    set fields  [split $text { }]
    set branch  [lindex $fields 0]
    set commit  [lindex $fields 1]
    set comment [string range $text 10 end]
    set command "echo pick $commit $comment"
    exec cmd /S /C $command >> $branch.txt
}

The script reads the commit sorting file line by line and splits by a space character { } to get the two fields branch and commit, and takes a substring (characters 10 onwards) for a description of the commit. The description isn't required but it's useful for us humans to check for mistakes.

It then puts a line into the appropriate git-rebase-todo file, creating one file per feature. I hacked this by executing a very ugly Windows echo string >> file command.

This creates a number of files, e.g. my file a.txt

pick 67fedda Receipt Report HEADER: 20! multiply by Paym_FieldCount
pick cf96185 Receipt Report: Gratuity
pick 7722ac8 Receipt Report: use amount tendered for calculations
pick 9a0b471 Receipt Report HEADER: enum PaymentEntries (with Paym_FieldCount)
pick e470e07 m_Gratuities m_dSSGratuities (SalesSummary)

The whole thing is ugly. I don't recommend it unless you have to do it and are good at writing scripts.


I wrote the text above some time ago, and I have had a little rethink about things. Above I implied that this is a lot of work and not worth doing, but I have since seen situations where it looks like someone has done the above and it has been very worthwhile.

I remember releases of Visual Studio for MFC/C++ where each new release would have compiler changes, IDE changes, MFC improvements, and run on a later version of Windows. This meant that if you wanted to get your compiler away from VS6 and Windows XP you might have to make language changes to satisfy the compiler, and function call changes to satisfy MFC, etc.

Now suppose that Microsoft had taken weekly backups as they developed Visual Studio, and someone methodically took the old backups and committed the code changes into a version control system like Git. Then they started classifying the changes ...

  • a. = compiler changes

  • b. = library changes

  • c. = IDE changes

  • d. = security improvements

    etc.

Microsoft could create branches for each of these, and start to have the latest and greatest IDE (c included), running on the newest Windows and still capable of compiling old legacy programs using the language (no a) and libraries (no b) they were written for.

Developers previously locked into legacy software could then make improvements in a logical and incremental fashion, e.g. language changes and library changes independently of each other, and do it on the latest and greatest Visual Studio without having to pass through all the intermediate versions.

Example

  <LanguageStandard>stdcpp14</LanguageStandard>

Now I am not saying this is what has happened, but it seems to me that the recent versions of Visual Studio are far better at allowing legacy programs to be updated, rather than thrown away and (never?) rewritten, and it appears to me to be due to version controlling and organizing old software changes into logical branches: compiler versions, DLL/library versions.

So, I can see occasions where splitting a huge number of old commits into distinct branches may be worthwhile.

On Visual Studio 2019 I can add the line

<PlatformToolset>v141_xp</PlatformToolset>

to a configuration file and manage to compile and run an old windows program which failed to compile and link with VS 2015, and VS 2017. It looks very much like someone at Microsoft has rebased performance and security improvements onto some old software while leaving out the breaking changes that often come with modernization.

Kobylak answered 17/10, 2017 at 13:13 Comment(0)
B
0

// on develop branch

git diff develop your/branch > diff.patch
git apply diff.patch
Balance answered 17/5, 2021 at 5:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.