My project uses over 100 git submodules, which submodule alternative can handle a lot of repositories gracefully
Asked Answered
O

3

8

I've been researching git subtree and other alternatives to git submodules. My project has well over 100 submodules and it's very unwieldy to manage them all.

Can anyone recommend a workflow that works really well with a large number of repositories that need to be kept in sync.

Orford answered 6/5, 2015 at 18:39 Comment(0)
A
10

If you project has over 100 git submodules of components and dependencies, their management will be unwieldy no matter which approach you use :-) I suggest look for ways to script and automate as many parts as possible. Trust me, the novelty of playing with and chaining git commands wear out very quickly for most people, especially when deadlines are approaching. There is already a very good answer here on the comparison of the different approaches to manage git sub-projects.

Regarding workflow, I will first separate repositories that are under your control from those that aren't i.e. 3rd party repositories.

For 3rd party repositories which don't change often (either via merges or upstream PRs), you can still use submodules. Typically, you will point these submodules to the HEAD of some stable tags. Sync-ing them it's just a matter of running (or scripting) git submodule update --recursive --remote. If these 3rd party dependencies can be specified in package management tools like bundler (for ruby projects), it will help to simplify your subprojects management.

For repositories that your own and change often, either gitslave or git-subtree are two alternatives, depending on your team's preferences.

gitslave multiplexes git operations into multiple branches. IOW, when you branch, merge, commit, push, pull etc., each command will be run on the parent project and all slaves in turn. This mandates the team to work in a top-down manner, starting from the super-project down to the slaves.

gitsubtree uses Git’s subtree merge functionality to achieve a similar effect as submodules, by actually storing the files in the main repository and merging in changes directly to that repository. The end result is a canonical repository with the option of including all the subprojects' history. In a way, this allows team members to focus more on the subtrees they are responsible for, but will require extra work to merge back to the parent tree.

As a developer, my preference is to work at the lower sub-projects level (to do my "red, green, refactor" cycle), and touch the parent projects only when necessary. But regardless of whether you choose a top-down or bottom-up workflow, try to identify repetitive error-prone steps in your branching & merging strategy, and script them as much as possible.

Amnesty answered 8/5, 2015 at 6:38 Comment(4)
I had forgotten about gitslave. Funny, it's basically what we are doing now but with a failure-prone script that's probably much worse than gitslave.Orford
The only thing with using gitslave is that you might end up with many branches that may serve no significant purposes. Anyway, there will be trade-offs in all approaches. Good luck! btw, if you found the answer helpful, please consider upvoting and/or accepting. Thanks!Amnesty
That's the problem we have right now - lots of branches, duplicated in all the submodules as well. I'm new to stackoverflow, despite being a member for a long time I've never asked a question before. How do I accept? It won't let me up-vote because I do not have enough karma.Orford
Yeah, that's the problem with a top-down workflow, say, by using gitslave to multiplex a command from the super-project. My preference is still a bottom-up approach where most of the team members only have to worry about the sub-projects they work with, and perform upstream merge back to the super-project when needed. To accept an answer, click on the tick sign under the downvote arrow.Amnesty
G
2

I've had the same issue, not 100 submodules, but about 15-20, I built a cli to assist in commit, push, pull, rebase, checkout, etc. I also used hard linking within my applications so the cli also handles that, but its not necessary to hard link. The cli is written in go, and has releases for all sorts of os platforms

For my applications, my workflow usually has a ".boiler" folder where all my submodules go, then I hard link files within the .boiler to the "src" of my application, then when i make edits to the linked file, it updates the source file, which is in the gitsubmodule

here's the link to the cli with install instructions, of course you can just download the release and add it to any path thats in your global PATH

https://github.com/ml27299/lit-cli

Gallic answered 10/2, 2019 at 19:24 Comment(0)
V
1

Far better to use a monorepo. The sensible reason for submodules would be if you need different packages to have different access privileges.

If this is the case, then split code into separate monorepos based upon access privileges. Then use https://github.com/ingydotnet/git-subrepo to allow all monorepos in a single monorepo.

Vivienviviene answered 15/2, 2021 at 22:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.