Vendor Branches in Git
Asked Answered
B

4

36

A Git project has within it a second project whose content is being worked on independently.

Submodules cannot be used for the smaller, as even the subproject must be included when users attempt to clone or download the 'parent'.

Subtree-merging cannot be used, as the subproject is being actively developed, and subtree merging makes it very difficult to merge those updates back into the original project.

I have been informed that the solution is known in the SVN world as "Vendor Branches", and that it is so simply done in Git so as to not even need addressing. Half-baked tutorials abound on the 'net.

Nonetheless, I cannot seem to get it to work.

Can someone please (pretty please?) explain how I can create a structure whereby one project exists within another, and both can be developed and updated from the same working directory. Ideally [or rather: it is quite important, if unsupported] that when a client attempts to download the 'parent' project, that he should be given the latest version of the subproject automatically.

Please do NOT explain to me how I should use submodules or subtree-merges or even SVN:Externals. This thread is the outgrowth of the following SO thread, and if something was missed there, please DO post it there. This thread is trying to get an understanding of how to Vendor branches, and the longer, clearer, and more dummied an explanation I receive the happier I will be.

Blackford answered 20/4, 2009 at 19:33 Comment(12)
Just added some thought based on your comment: do not hesitate to further refine if that does not address completely the situationRestive
Just added further thoughts on your requirements, but still without a definitive answer. Sorry for that.Restive
Just saw your comments on Paul's answer ( stackoverflow.com/questions/720669/… ) : he is certainly more advanced in Git matters than I am. I hope he will provide more details (but he might have to ask you more details about what is blocking you, though)Restive
Ruby? Git has nothing to do with ruby... it was designed for the Linux kernel. Git bash has always worked fine for me the rare day I'm on a Win32 system.Accouchement
@singpolyma: Yes, GIT was created (rather hastily) for Linux by Torvalds himself, and Ruby only adopted it after it had reached an beta stage. Nonetheless, on many tutorials that give examples (that I've encountered), the code is Ruby instead of straight Bash. I'm happy for you - both for your not having to be on Win, and for your having GIT work when you need it to. I suspect you are the exception, not the rule.Blackford
@singpolyma: The crux of my argument has been confirmed. GIT is not ready for primetime and msysgit is nigh hopeless (workflow: push..failed - push...failed - push..success - sigh) The ruby comment is entirely off-topic, but you should check out: github.com/rails/rails/tree/master..... And the announcement: weblog.rubyonrails.org/2008/4/2/rails-is-moving-from-svn-to-git.Blackford
It's on topic... you keep arguing that git is tied to ruby. It's not. Rail is not ruby.Accouchement
@SamGoody: If GIT is not ready for prime time then how come it is used by so many projects? I also find GIT tutorials very intuitive.Ricky
@EFraim. GIT has come a long way since April (see the commit messages). And now that we all use distributed VC, it is the best out there. But for Windows users, it's still falls short on stuff that should be in v1, (like line endings, or ignoring from a specific state). But why did that earn me a -1?Blackford
What you are describing is not a vendor branch. A vendor branch is where you maintain a custom version of a third party product, and the third party provides you only the source code, not access to their repository. When you receive an update from the third party, you commit it to your vendor branch, then merge their changes into your master branch.Wheaton
Thanks for asking this question. You have managed to put into words a question which has been nagging me for weeks. I am trying to work out how to manage a patchwork git repository of Drupal core, custom modules and contributed modules etc, and this question is so relevant to that.Darya
Another approach worth mentioning here is how the Chromium team solves this problem using DEPS files managed by tools they ship with depot_tools.Jailbreak
R
42

I think submodules are the way to go when it comes to "vendor branch".
Here is how you should use submod... hmmm, just kidding.

Just a thought; you want:

  • to develop both main project and sub-project within the same directory (which is called a "system approach": you develop, tag and merge the all system)
  • or to view your sub-project as a "vendor branch" (which is a branch which allows you to access a well-defined version of a vendor external component - or "set of files" - , and which is only updated with the new version every release of that external component: that is called a "component-approach", the all system is viewed as a collection of separate components developed on their own)

The two approaches are not compatible:

  • The first strategy is compatible with a subtree-merge: you are working both on project and sub-project.
  • The second one is used with submodules, but submodules is used to define a configuration (list of tag you need to work): each git submodules, unlike svn:externals, are pinned to a particular commit id, and that is what allows you to define a configuration (as in SCM: "software configuration management")

I like the second approach because most of the time, when you have a project and a sub-project, their lifecycle is different (they are not developed at the same rhythm, not tagged together at the same time, nor with the same name).

What really prevents that approach ("component-based") in your question is the "both can be developed and updated from the same working directory" part.
I would really urge you to reconsider that requirement, as most IDE are perfectly capable to deals with multiple "sources" directories, and the sub-project development can be done in its own dedicated environment.


samgoody adds:

Imagine an eMap plugin for both Joomla and ModX. Both the plugin and the Joomla-specific code (which is part of Joomla, not of eMap) are developed while the plugin is inside Joomla. All paths are relative, the structure is rigid, and they must be distributed together - even though each project has its own lifecycle.

If I understand correctly, you are in a configuration where the development environment (the set of files you are working on) is quite the same than the distribution environment (the same set of file is copied on the release platform)

It all comes done to a granularity issue:

  • if both sets of files cannot exist one without the other, then they should be viewed as one big project (and subtree-merged), but that force them to be tagged and merged as one. -if one depends on the other (which can be developed alone), then they should be in their own Git repository and project, the first one depending on a specific commit of the second as a sub-module: if the sub-module is defined in the right subtree of the first component, all relative paths are respected.

samgoody adds:

The original thread listed issues with submodules - primarily that GitHub's download doesn't include them (vital to me) and that they get stuck on a particular commit.

I am not sure GitHub's download is an issue recently: that "Guides: Developing with Submodules" article does mention:

Best of all: people cloning your my-awesome-framework fork will have no problem pulling down your my-fantastic-plugin submodule, as you’ve registered the public clone URL for the submodule. The commands

$ gh submodule init
$ gh submodule update

Will pull the submodules into the current repository.

As for the "they get stuck on a particular commit": that is the all point of a submodule, allowing you to work with a configuration (list of tagged version of components) instead of a latest potentially unstable set of files.

samgoody mentions:

I need to avoid both subtrees and submodules (see question), and would rather address this need without arguing too much if the approach is justified

Your requirement is a perfectly legitimate one, and I do not want to judge its justification: my previous answers are only here to provide a larger context and try to illustrate the options usually available with a generic SCM tool.

Subtree merge should be the answer here, but would imply to merge back only commits made for files for the main project, and not commits made for the sub-projects. If you can manage that kind of partial merge, I would reckon it is the right path to follow.

I do not see however a native Git way to do what you want that does not use subtree-merge or submodule.
I hope a true Git guru will post here a more adequate answer.

Restive answered 20/4, 2009 at 20:28 Comment(6)
Imagine an eMap plugin for both Joomla and ModX. Both the plugin and the Joomla-specific code (which is part of Joomla, not of eMap) are developed while the plugin is inside Joomla. All paths are relative, the structure is rigid, and they must be distributed together - even though each project has its own lifecycle. I'm not up-to-date with IDEs; our office switched last year from Eclipse to Notepad++ for HTM/PHP/JS, and from Flex to FD for AS [Productivity has gaind]. Am I missing something big?Blackford
Your answer: If the projects are interdependant, subtree-merge. Otherwise use submodules. I need to avoid both subtrees and submodules (see question), and would rather address this need without arguing too much if the approach is justified. In my case the larger project must include the subproject but not vice versa. The original thread listed issues with submodules - primarily that GitHub's download doesn't include them (vital to me) and that they get stuck on a particular commit. Any ideas?Blackford
This is a very interesting answer. I have not heard of comparison between component and system approaches to scm before. Where does that concept come from? i.e. Is there some seminal academic paper/blog post or book that popularised these thoughts? If so, could you provide a reference to it please?Darya
@Darya I first encountered those concepts in my past experience with ClearCase: https://mcmap.net/q/13317/-clearcase-ucm-best-practices-using-componentsRestive
Thanks for the reference VonC . I will look into it in detail soon. I guess, from a cursory glance, these ideas seem to derive from IBM's Unified Change Management. redbooks.ibm.com/redbooks/SG246399/wwhelp/wwhimpl/common/html/… which is something I've added to my reading list.Darya
@Darya yes, UCM. I have been critical of the implementation of UCM because of parasite baselines, but the concept was sound.Restive
W
7

I finally have a few hours access to the net before I head back to the mountains. We'll see if I have anything to contribute clarity into your situation.

My (probably oversimplified) understanding is you have (offsite) vendor(s) developing plug-in(s) for your project where your (in-house) team is developing code for your main project using an externally sourced framework. Vendor doesn't make changes to your code and probably doesn't need your bleeding edge development, but does need your stable code to develop and test their work. Your team doesn't make changes to the framework, but does sometimes contribute changes to the plug-in.

  1. Like VonC (who usually thinks things thru very thoroughly) I don't think Git has a perfect fit for your requirements. And like him, I think using subtree merge pattern is the closest fit. I'm not a Git guru, but I have been successful at bending Git to a wide range of needs. Maybe Git doesn't meet your needs:

    • SVN will let you have multiple repos within one, which seems important for you. I think this would mean either using externals or the Vendor Branch pattern to come close to what you want.

    • Mercurial has an extension, Forest, for using nested repos, which seems to fit your mental model better. I chose Git over Mercurial 15 months ago, but HG was stable and for many uses I think it is comparable to Git. I don't know how stable the extension is.

  2. If I were in your situation, I'd use two Git repos -- one for the Plugin and one for the MainProject. The vendor would do development in the Plugin repo and would have a release branch that they pull current versions of the plug-in into without the rest of the development environment. That branch would be pulled into the MainProject repo as a vendor branch, and then merged into your main development branch. When your team works on a change to the plug-in, they develop it in a feature branch off of your main development branch and submit it to the vendor repo as patches. This gives you a very clean workflow, relatively easy to set-up and learn, while keeping the develop history segregated.

    I'm not trying to be argumentative, but simply to say this is Git's best fit for my understanding of your situation. The easiest way to set this up would use the subtree merge, but this does not run changes thru it in both directions, which was my objection to using that pattern.

  3. If your team is really actively involved in the plugin development or you really want to have the development history of both project and plug-in integrated in one Git repo, then just use one Git repo. You can extract the plug-in and its history for the records of your vendor as explained here, from time to time. This may give you less encapsulation than you intend, but Git is not designed for encapsulation -- Git's data structure is based on tracking changes within one whole project.

Maybe I've misunderstood your situation and none of this applies. If so I apologize. Thanks for the details that you and VonC have worked out, which have filled in many holes that I originally had in trying to understand your question.

Whyte answered 22/4, 2009 at 20:53 Comment(4)
On a personal note - ..You say "I'm not a Git guru" - I notice that your answer is the accepted answer on virtually every Git related Q' on the site. I do very much appreciate your help. .."VonC (who usually thinks things thru very thoroughly)" - I should think so. How the heck does someone get 25,000+ points in a site with contributions from world experts?! I'm in awe! I appreciate his help as well. You guys rock. Now, if just I can get this to work... :)Blackford
"..I'd use two Git repos" - I'm trying this, but (me the newb) am having trouble understanding. From this post I gather that to update the vendor project, I should subtree-merge their latest commits to the parent, create a branch of the parent called "vendor", and make my changes. They than pull in those changes from my "vendor" branch into their project. I didn't realize this was doable, and don't understand the point in making it a branch. Also didn't the other post avoid a subtree-merge?Blackford
Please don't get upset, but do tell me if there is a someplace to see the complete line by line workflow of this in action. (a theoretical example) [1. clone git://project.git 2. merge branch origin 3. commit origin master etc.] Thank you very much.Blackford
There is quite a nice explanation of the workflow involved in using subtree merges in the section on Subtree Merging In Chapter 9 of the Pro Git (2009) book by Scott Chacon. Perhaps that is helpful.Darya
M
1

If you look just for the original question's title:

a good template for vendor branch pattern with git is on

https://www.roe.ch/Git_Reference

section Vendor branch pattern

Mildred answered 17/1, 2018 at 13:42 Comment(1)
this should be the accepted answer... the OP asks for a solution and people give alternatives that don't really fit the bill. I feel sorry for the OP, because this was the clear and simple solution.Dominations
S
0

Subtree-merging cannot be used, as the subproject is being actively developed, and subtree merging makes it very difficult to merge those updates back into the original project.

The original question (from Apr 20, 2009) predates the announcement of git subtree by only 10 days. It's hard to tell exactly what the OP was looking for but git subtree could be the right answer.

Note that git subtree is a command-line tool. Nowadays it comes included with git. It uses subtree merging but it isn't the same thing. It has a git subtree push command that's designed for merging your local changes back into the upstream subproject.

I have been informed that the solution is known in the SVN world as "Vendor Branches", and that it is so simply done in Git so as to not even need addressing. Half-baked tutorials abound on the 'net.

I wrote https://david.rothlis.net/vendor-branch/ to explain vendor branches in git, and how they relate to git subtree.

Sidwell answered 5/10, 2022 at 9:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.