Is it possible to patch a submodule in Git from the parent project?
Asked Answered
P

4

26

I have a project main that contains a submodule foo. For this particular project, I would like to make a small change to foo that only applies to this particular project main.

main/
  + .git
  + main.c
  + lib/
  |   + bar.c
  + foo/           # My `foo` submodule
      + .git
      + config.h   # The file I want to patch from `main`
      + ...

A common solution would be to go down to my submodule, make a commit Applied patch for main on a new branch called main-project, then push it. Unfortunately, this is a very bad approach because I am making changes to foo that only matters to main. Also, when I update foo to the latest version, I will have to cherry-pick the patch too which introduces a lot of noise in foo's history.

Another solution is to have a real patch file on main that is applied to foo just before the build. Unfortunately as this modifies the submodule content, and I will have uncommitted changed on foo, so it is not a good solution either.

The ideal solution would be to track my patch using Git, but at the top-level (e.g. directly on main, not on foo). Theoretically, it would be possible to add a blob on the Git tree that points into the submodule location:

blob   <sha> main.c
tree   <sha> lib/
commit <sha> foo
blob   <sha> foo/config.h

With this idea, the patched file config.h belonging to foo will be tracked on main.

How is it possible to do it so?

Plauen answered 29/11, 2017 at 21:46 Comment(3)
I don't know many people who regard the records vcs's keep of what's been done as noise, most people I know regard those as a selling point. If you're going to update your changes for a new vendor release, apply your patches on a new base, I'd think you'd want a vcs to record what's that's been done, so others (perhaps just you six weeks from now) can reproduce and test the work.Merrell
@Merrell When the upstream repository is huge and changes often and your patches are small and rare, it makes a lot of sense, maintenance-wise. Please post your concerns in a new question; I'd like to answer in more details.Shwalb
P.S.: I find patch versioning an interesting concept, especially since I only understood recently why (I think that) it is done.Shwalb
A
9

I would still go with the second option (have a real patch file on main), but adapt my build process to:

  • make a copy of the config.h in the submodule
  • apply the patch
  • build
  • restore config.h to its original content.

That way, I keep the submodule status unchanged.

The OP adds in the comments:

But your solution is not working in a IDE, Intellisense will be confused –

True: for that, I would apply automatically the patch on checkout, and remove it on checking, through a smudge/clean content filter driver.
That way, the patch remains in place during the all session, but would disappear on any git status/diff/checkin.

This is not ideal though, and there does not seem to be a native Git way to handle this.

Aixlachapelle answered 30/11, 2017 at 5:54 Comment(5)
As you said it is not ideal. I am wondering how possible it would be to propose this feature on GitPlauen
@Plauen I don't see it as a native feature of Git, but this workaround should be enough: if scripted, the script will be part of your sources, making the process clearer.Aixlachapelle
Yeah, but your solution is not working in a IDE, Intellisense will be confusedPlauen
@Plauen Agreed. For that you would need to apply that patch on checkout, and remove it on checkin: that is the content filter driver approach that I describe for instance here: https://mcmap.net/q/537000/-branch-specific-configuration-file-maintenance-with-gitAixlachapelle
Wait, Intellisense isn't smart enough to apply patches during a build? And this is somehow Git's fault?Merrell
M
11

The simplest way to carry project-specific patches on a vendor history is to clone the vendor repo, carry the changes as a project branch and advertise that clone as the .gitmodules upstream.

This makes work on your changes to the vendor upstream project perfectly ordinary, git clone --recurse-submodules yourproject works fine, your-submodule changes can be pushed back to the your-project-submodule upstream (the submodule repo's origin remote), everything works.

The only additional fillip is, to update your project's version of the submodule to the latest vendor code somebody has to fetch and merge from the (further-upstream) vendor repo

... but that's also perfectly ordinary: the way to fetch and merge from the vendor repo is, do it. git remote add vendor u://r/l; git fetch vendor; git merge vendor/master. Or if you prefer rebase to merge, do that. Once you've done that, push the results to your submodule's origin, your project's version, all as usual.

Merrell answered 15/9, 2019 at 18:34 Comment(0)
A
9

I would still go with the second option (have a real patch file on main), but adapt my build process to:

  • make a copy of the config.h in the submodule
  • apply the patch
  • build
  • restore config.h to its original content.

That way, I keep the submodule status unchanged.

The OP adds in the comments:

But your solution is not working in a IDE, Intellisense will be confused –

True: for that, I would apply automatically the patch on checkout, and remove it on checking, through a smudge/clean content filter driver.
That way, the patch remains in place during the all session, but would disappear on any git status/diff/checkin.

This is not ideal though, and there does not seem to be a native Git way to handle this.

Aixlachapelle answered 30/11, 2017 at 5:54 Comment(5)
As you said it is not ideal. I am wondering how possible it would be to propose this feature on GitPlauen
@Plauen I don't see it as a native feature of Git, but this workaround should be enough: if scripted, the script will be part of your sources, making the process clearer.Aixlachapelle
Yeah, but your solution is not working in a IDE, Intellisense will be confusedPlauen
@Plauen Agreed. For that you would need to apply that patch on checkout, and remove it on checkin: that is the content filter driver approach that I describe for instance here: https://mcmap.net/q/537000/-branch-specific-configuration-file-maintenance-with-gitAixlachapelle
Wait, Intellisense isn't smart enough to apply patches during a build? And this is somehow Git's fault?Merrell
S
8

The ideal solution would be to track my patch using Git, but at the top-level (e.g. directly on main, not on foo). Theoretically, it would be possible to add a blob on the Git tree that points into the submodule location:

Even if this is doable, personally I find it very convoluted. Unless this was implemented natively and included user-facing commands that make it easy to understand and manage, I'd stay away from it.

Alternative 1: Patch

Is there are more elegant/streamlined way of doing a patch in a git submodule, other than the accepted answer?

If you would like to avoid messing with the submodule at all, I would suggest copying over/checking out the worktree somewhere else and using only that during the build. That way, the submodule is always "clean" (and perhaps "immutable", from main's point of view) and you only have to worry about it in the build directory.

Simplified build process example:

cd main
mkdir -p build
cp -R foo/ build/
cp myconfig.patch build/
cd build
patch <myconfig.patch
make

Note that this builds only foo, and that main's build process does not need to be altered besides having to point to build/ instead of foo/.

If you do not intend on modifying foo itself/would rather keep it "pristine", you could also turn it into a bare repository and use GIT_WORK_TREE="$PWD/build" git checkout HEAD instead of cp, so that it is only checked out during the build. This is similar to how makepkg(8) does it (at least in my experience with the AUR) in order to avoid modifying the original sources ($source array vs $srcdir). It also splits source retrieval from the build itself (prepare() vs build()). See also PKGBUILD(5) and Creating packages. In your case, development and an IDE are also involved, so it might be trickier if you want to inspect both the original and the build files at once.

Pros:

  • Sources are separated from the build files
  • main does not affect foo
  • Does not depend on git/makes it merely a build automation issue
  • Only requires a patch file

Cons:

  • Need to keep the patch file updated (vs rebasing changes)
  • Need to change the build process

I'd go with this one if your patches are small and/or very specific to main.

P.S.: It is possible to go one step further and track foo's version directly in the build process instead of using submodules if you wanted to:

Move foo one directory up, then in the build process:

cd build
GIT_DIR='../../foo/.git' git checkout "$myrev"
patch <myconfig.patch
make

Alternative 2: Separate Branch

Also, when I update foo to the latest version, I will have to cherry-pick the patch too which introduces a lot of noise in foo's history.

You don't really have to cherry-pick it, you could just keep the changes in your branch instead and merge master every once in a while.

Personally, I would avoid this unless your changes are much more significant than the noise caused by keeping it in sync (i.e.: the merges and conflicts). I find merge commits to be very opaque, especially when conflicts are involved, as unrelated/accidental changes are harder to detect.

Rebasing your commits onto master is also an option.

Pros:

  • No need for a separate repository
  • Keeps the worktree in the same place (no need to mess with your IDE)

Cons:

  • Pollutes foo's repository with unrelated commits (when merging)
  • Pollutes foo's repository with unrelated commit objects (when rebasing)
  • Murky history of the evolution of your changes to config.h (when rebasing)

Alternative 3: Soft Fork

Also, when I update foo to the latest version, I will have to cherry-pick the patch too which introduces a lot of noise in foo's history.

Unfortunately, this is a very bad approach because I am making changes to foo that only matters to main

If you want to change foo to suit main, but not mess with foo upstream, why not create a soft-fork of foo? If you do not care too much about foo-fork's history, you could just commit your changes on the main-project branch and keep it in sync with foo's master through rebase:

Creating the fork:

cd foo
git remote add foo-fork 'https://foo-fork.com'
git branch main-project master
git push -u foo-fork main-project

Keeping it in sync:

git checkout main-project
git pull --rebase foo/master
# (resolve the conflicts, if any)
git push foo-fork

Pros:

  • Easy to sync with upstream (e.g.: with pull --rebase)
  • Keeps the worktree in the same place (no need to mess with your IDE)

Cons:

  • Murky history of the evolution of your changes to config.h (because of rebasing)

The added benefit of using a patch instead of rebasing is that you keep the history of the patch. But if you want to keep things really simple sync-wise, I suppose that this is the way.

Alternative 4: Hard Fork

If you find that foo changes too much/too often and/or you need to patch too many things, your best bet is probably creating a full fork and cherry-picking their changes instead.

Shwalb answered 15/9, 2019 at 3:5 Comment(0)
A
3

You might find it makes more sense to use a git subtree rather than a submodule. Subtree copies the other project into a location in the local repository, rather than managing it as a separate repo inside the current repo.

You can apply your change locally as part of your main master branch, then periodically update, merging in changes from upstream.

Americaamerican answered 13/9, 2019 at 13:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.