Is it possible to manage multiple repositories coherently?
Asked Answered
E

6

15

I have a collection of git repositories that are independently versioned but related to one another. For example, when adding a feature I may have to add functionality to a shared library, and then extend an application or service to take advantage of that functionality.

Currently, I would have to create a branch on each of the repositories that I touch while working on that feature. What I would like to do, however, is to simplify the process by branching and merging those multiple repositories at once (to reduce the likelihood of forgetting to branch, or committing/merging in one repository but not another).

Is there a simple way to branch and merge multiple repositories at once, or is this a task better suited for a collection of helper scripts? If the latter, are there any scripts available that already accomplish this?

Emulsifier answered 3/2, 2012 at 21:58 Comment(1)
possible duplicate of Managing many git repositoriesKingsize
K
5

There isn't a built-in way to deal with multiple repositories at once

But there will be.

Git 2.27 (Q2 2020) paves the way with "git update-ref --stdin" which learned a handful of new verbs to let the user control ref update transactions more explicitly.

That helps as an ingredient to implement two-phase commit-style atomic ref-updates across multiple repositories.

See commit e48cf33, commit 94fd491, commit de0e0d6, commit 804dba5, commit 5ae6c5a, commit a65b8ac (02 Apr 2020), and commit bd021f3, commit faa35ee, commit edc3069 (30 Mar 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit d2ea03d, 29 Apr 2020)

update-ref: implement interactive transaction handling

Signed-off-by: Patrick Steinhardt

The git-update-ref command can only handle queueing transactions right now via its "--stdin" parameter, but there is no way for users to handle the transaction itself in a more explicit way.

E.g. in a replicated scenario, one may imagine a coordinator that spawns git-update-ref for multiple repositories and only if all agree that an update is possible will the coordinator send a commit.

Such a transactional session could look like

> start
< start: ok
> update refs/heads/master $OLD $NEW
> prepare
< prepare: ok
# All nodes have returned "ok"
> commit
< commit: ok

or:

> start
< start: ok
> create refs/heads/master $OLD $NEW
> prepare
< fatal: cannot lock ref 'refs/heads/master': reference already exists
# On all other nodes:
> abort
< abort: ok

In order to allow for such transactional sessions, this commit introduces four new commands for git-update-ref, which matches those we have internally already with the exception of "start":

  • start: start a new transaction
  • prepare: prepare the transaction, that is try to lock all references and verify their current value matches the expected one
  • commit: explicitly commit a session, that is update references to match their new expected state
  • abort: abort a session and roll back all changes

By design, git-update-ref will commit as soon as standard input is being closed.
While fine in a non-transactional world, it is definitely unexpected in a transactional world.
Because of this, as soon as any of the new transactional commands is used, the default will change to aborting without an explicit "commit".
To avoid a race between queueing updates and the first "prepare" that starts a transaction, the "start" command has been added to start an explicit transaction.

Add some tests to exercise this new functionality.


That built-in way to deal with multiple repositories at once continues with Git 2.28 (Q3 2020), and a new hook.

See commit 6754159 (19 Jun 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 33a22c1, 06 Jul 2020)

refs: implement reference transaction hook

Signed-off-by: Patrick Steinhardt

The low-level reference transactions used to update references are currently completely opaque to the user.
While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction.

One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them.

While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this.

The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism.

The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates.

While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely.

Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service.
When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero.
The most important upside is that this will catch all commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism.

In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs.
Run against an empty repository, it produces the following results:

Test                         origin/master     HEAD
--------------------------------------------------------------------
1400.2: update-ref           2.70(2.10+0.71)   2.71(2.10+0.73) +0.4%
1400.3: update-ref --stdin   0.21(0.09+0.11)   0.21(0.07+0.14) +0.0%  

The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations.
p1400.3 instead calls git-update-refs --stdin three times and queues a thousand creations, updates and deletes respectively.

As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup.
On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead.


With Git 2.29 (Q4 2020), the hook is simplified, by removing ineffective optimization.

See commit 0a0fbbe (25 Aug 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 6ddd76f, 31 Aug 2020)

refs: remove lookup cache for reference-transaction hook

Signed-off-by: Patrick Steinhardt

When adding the reference-transaction hook, there were concerns about the performance impact it may have on setups which do not make use of the new hook at all.
After all, it gets executed every time a reftx is prepared, committed or aborted, which linearly scales with the number of reference-transactions created per session.
And as there are code paths like git push(man) which create a new transaction for each reference to be updated, this may translate to calling find_hook() quite a lot.

To address this concern, a cache was added with the intention to not repeatedly do negative hook lookups.
Turns out this cache caused a regression, which was fixed via e5256c82e5 ("refs: fix interleaving hook calls with reference-transaction hook", 2020-08-07, Git v2.29.0 -- merge listed in batch #8).

In the process of discussing the fix, we realized that the cache doesn't really help even in the negative-lookup case.
While performance tests added to benchmark this did show a slight improvement in the 1% range, this really doesn't warrent having a cache.
Furthermore, it's quite flaky, too. E.g. running it twice in succession produces the following results:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.79(2.16+0.74)   2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin   0.22(0.08+0.14)   0.21(0.08+0.12) -4.5%

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.70(2.09+0.72)   2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin   0.21(0.10+0.10)   0.21(0.08+0.13) +0.0%

One case notably absent from those benchmarks is a single executable searching for the hook hundreds of times, which is exactly the case for which the negative cache was added.
p1400.2 will spawn a new update-ref for each transaction and p1400.3 only has a single reference-transaction for all reference updates.
So this commit adds a third benchmark, which performs an non-atomic push of a thousand references. This will create a new reference transaction per reference. But even for this case, the negative cache doesn't consistently improve performance:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push       6.63(6.50+0.13)   6.81(6.67+0.14) +2.7%
1400.4: nonatomic push       6.35(6.21+0.14)   6.39(6.23+0.16) +0.6%
1400.4: nonatomic push       6.43(6.31+0.13)   6.42(6.28+0.15) -0.2%

So let's just remove the cache altogether to simplify the code.


With Git 2.30 (Q1 2021), "git update-ref --stdin(man)" learns to take multiple transactions in a single session.

See commit 8c4417f, commit 2102043, commit 262a4d2, commit c0e1726 (13 Nov 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 1bc550e, 08 Dec 2020)

update-ref: allow creation of multiple transactions

Signed-off-by: Patrick Steinhardt
Reviewed-by: Jeff King

While git-update-ref has recently grown commands which allow interactive control of transactions in e48cf33b61 ("update-ref: implement interactive transaction handling", 2020-04-02, Git v2.27.0-rc0 -- merge listed in batch #5), it is not yet possible to create multiple transactions in a single session. To do so, one currently still needs to invoke the executable multiple times.

This commit addresses this shortcoming by allowing the "start" command to create a new transaction if the current transaction has already been either committed or aborted.

git update-ref now includes in its man page:

explicit commit. This command may create a new empty transaction when the current one has been committed or aborted already.


Before Git 2.34 (Q4 2021), "git update-ref"(man) --stdin failed to flush its output as needed, which potentially led the conversation to a deadlock.

That has been fixed, and was the consequence of the path mentioned before.

See commit efa3d64 (03 Sep 2021) by Patrick Steinhardt (pks-t).
See commit 7c12007 (15 Sep 2021) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 06a0eea, 23 Sep 2021)

update-ref: fix streaming of status updates

Signed-off-by: Patrick Steinhardt

When executing git-update-ref(1) with the --stdin flag, then the user can queue updates and, since e48cf33 ("update-ref: implement interactive transaction handling", 2020-04-02, Git v2.27.0-rc0 -- merge listed in batch #5), interactively drive the transaction's state via a set of transactional verbs.
This interactivity is somewhat broken though: while the caller can use these verbs to drive the transaction's state, the status messages which confirm that a verb has been processed is not flushed.
The caller may thus be left hanging waiting for the acknowledgement.

Fix the bug by flushing stdout after writing the status update.

Kierakieran answered 2/5, 2020 at 18:54 Comment(4)
Dude, awesome answer!Alsatian
@Alsatian Thank you. And thanks to Patrick Steinhardt's great work on that.Kierakieran
To clarify If I have a git super project with with 2 git submodules compA/ and compB/ and I make a commit involving these 3 git repos, I am guess that involves 3 commits, commit-hash-compA, commit-hash-compB and a commit to the super project say commit-superprojject. And when pushing a pulling these needs to be pushed/pulled/merge together. So I push this changeset that involves these 3 commits and someone else trying to do something like this in their own git workspace, both these pull requests need to come together into the central repository or a 3rd repository atomically. Is that possible?Interpellation
@SarviShanmugham Not sure: you would need to test that scenario. The update-ref is very much a work in progress.Kierakieran
A
7

There isn't a built-in way to deal with multiple repositories at once, and if you think about the distributed nature of git, there really isn't anything but social convention that can define that. (Consider, what if someone pulls from you in one repository, but another remote for the second - can you have this coherence?)

You might find a tool like mr, which works with multiple repositories at the same time, helpful.

If you really have things that are tied that tightly, though, I would advise you to put them into one repository. That way you can't forget any of the steps, because they happen in one atomic operation.

If your code isn't actually that tightly tied together, though, then give up the "must branch at exactly the same time" notion and you will be happier.

Ankney answered 3/2, 2012 at 22:13 Comment(1)
+1 mr is pretty awesome, though it doesn't work all too well on WindowsAltocumulus
B
5

You can use the repo tool: https://gerrit.googlesource.com/git-repo/

  • set the same branch on multiple projects
  • view state of multiple repositories (branches, unmerged commits) with one command
  • manage a set of repositories (clone, sync, etc) with manifest files
  • lots of other cool stuff
Bogbean answered 5/2, 2012 at 1:18 Comment(0)
K
5

There isn't a built-in way to deal with multiple repositories at once

But there will be.

Git 2.27 (Q2 2020) paves the way with "git update-ref --stdin" which learned a handful of new verbs to let the user control ref update transactions more explicitly.

That helps as an ingredient to implement two-phase commit-style atomic ref-updates across multiple repositories.

See commit e48cf33, commit 94fd491, commit de0e0d6, commit 804dba5, commit 5ae6c5a, commit a65b8ac (02 Apr 2020), and commit bd021f3, commit faa35ee, commit edc3069 (30 Mar 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit d2ea03d, 29 Apr 2020)

update-ref: implement interactive transaction handling

Signed-off-by: Patrick Steinhardt

The git-update-ref command can only handle queueing transactions right now via its "--stdin" parameter, but there is no way for users to handle the transaction itself in a more explicit way.

E.g. in a replicated scenario, one may imagine a coordinator that spawns git-update-ref for multiple repositories and only if all agree that an update is possible will the coordinator send a commit.

Such a transactional session could look like

> start
< start: ok
> update refs/heads/master $OLD $NEW
> prepare
< prepare: ok
# All nodes have returned "ok"
> commit
< commit: ok

or:

> start
< start: ok
> create refs/heads/master $OLD $NEW
> prepare
< fatal: cannot lock ref 'refs/heads/master': reference already exists
# On all other nodes:
> abort
< abort: ok

In order to allow for such transactional sessions, this commit introduces four new commands for git-update-ref, which matches those we have internally already with the exception of "start":

  • start: start a new transaction
  • prepare: prepare the transaction, that is try to lock all references and verify their current value matches the expected one
  • commit: explicitly commit a session, that is update references to match their new expected state
  • abort: abort a session and roll back all changes

By design, git-update-ref will commit as soon as standard input is being closed.
While fine in a non-transactional world, it is definitely unexpected in a transactional world.
Because of this, as soon as any of the new transactional commands is used, the default will change to aborting without an explicit "commit".
To avoid a race between queueing updates and the first "prepare" that starts a transaction, the "start" command has been added to start an explicit transaction.

Add some tests to exercise this new functionality.


That built-in way to deal with multiple repositories at once continues with Git 2.28 (Q3 2020), and a new hook.

See commit 6754159 (19 Jun 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 33a22c1, 06 Jul 2020)

refs: implement reference transaction hook

Signed-off-by: Patrick Steinhardt

The low-level reference transactions used to update references are currently completely opaque to the user.
While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction.

One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them.

While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this.

The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism.

The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates.

While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely.

Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service.
When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero.
The most important upside is that this will catch all commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism.

In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs.
Run against an empty repository, it produces the following results:

Test                         origin/master     HEAD
--------------------------------------------------------------------
1400.2: update-ref           2.70(2.10+0.71)   2.71(2.10+0.73) +0.4%
1400.3: update-ref --stdin   0.21(0.09+0.11)   0.21(0.07+0.14) +0.0%  

The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations.
p1400.3 instead calls git-update-refs --stdin three times and queues a thousand creations, updates and deletes respectively.

As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup.
On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead.


With Git 2.29 (Q4 2020), the hook is simplified, by removing ineffective optimization.

See commit 0a0fbbe (25 Aug 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 6ddd76f, 31 Aug 2020)

refs: remove lookup cache for reference-transaction hook

Signed-off-by: Patrick Steinhardt

When adding the reference-transaction hook, there were concerns about the performance impact it may have on setups which do not make use of the new hook at all.
After all, it gets executed every time a reftx is prepared, committed or aborted, which linearly scales with the number of reference-transactions created per session.
And as there are code paths like git push(man) which create a new transaction for each reference to be updated, this may translate to calling find_hook() quite a lot.

To address this concern, a cache was added with the intention to not repeatedly do negative hook lookups.
Turns out this cache caused a regression, which was fixed via e5256c82e5 ("refs: fix interleaving hook calls with reference-transaction hook", 2020-08-07, Git v2.29.0 -- merge listed in batch #8).

In the process of discussing the fix, we realized that the cache doesn't really help even in the negative-lookup case.
While performance tests added to benchmark this did show a slight improvement in the 1% range, this really doesn't warrent having a cache.
Furthermore, it's quite flaky, too. E.g. running it twice in succession produces the following results:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.79(2.16+0.74)   2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin   0.22(0.08+0.14)   0.21(0.08+0.12) -4.5%

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.70(2.09+0.72)   2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin   0.21(0.10+0.10)   0.21(0.08+0.13) +0.0%

One case notably absent from those benchmarks is a single executable searching for the hook hundreds of times, which is exactly the case for which the negative cache was added.
p1400.2 will spawn a new update-ref for each transaction and p1400.3 only has a single reference-transaction for all reference updates.
So this commit adds a third benchmark, which performs an non-atomic push of a thousand references. This will create a new reference transaction per reference. But even for this case, the negative cache doesn't consistently improve performance:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push       6.63(6.50+0.13)   6.81(6.67+0.14) +2.7%
1400.4: nonatomic push       6.35(6.21+0.14)   6.39(6.23+0.16) +0.6%
1400.4: nonatomic push       6.43(6.31+0.13)   6.42(6.28+0.15) -0.2%

So let's just remove the cache altogether to simplify the code.


With Git 2.30 (Q1 2021), "git update-ref --stdin(man)" learns to take multiple transactions in a single session.

See commit 8c4417f, commit 2102043, commit 262a4d2, commit c0e1726 (13 Nov 2020) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 1bc550e, 08 Dec 2020)

update-ref: allow creation of multiple transactions

Signed-off-by: Patrick Steinhardt
Reviewed-by: Jeff King

While git-update-ref has recently grown commands which allow interactive control of transactions in e48cf33b61 ("update-ref: implement interactive transaction handling", 2020-04-02, Git v2.27.0-rc0 -- merge listed in batch #5), it is not yet possible to create multiple transactions in a single session. To do so, one currently still needs to invoke the executable multiple times.

This commit addresses this shortcoming by allowing the "start" command to create a new transaction if the current transaction has already been either committed or aborted.

git update-ref now includes in its man page:

explicit commit. This command may create a new empty transaction when the current one has been committed or aborted already.


Before Git 2.34 (Q4 2021), "git update-ref"(man) --stdin failed to flush its output as needed, which potentially led the conversation to a deadlock.

That has been fixed, and was the consequence of the path mentioned before.

See commit efa3d64 (03 Sep 2021) by Patrick Steinhardt (pks-t).
See commit 7c12007 (15 Sep 2021) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 06a0eea, 23 Sep 2021)

update-ref: fix streaming of status updates

Signed-off-by: Patrick Steinhardt

When executing git-update-ref(1) with the --stdin flag, then the user can queue updates and, since e48cf33 ("update-ref: implement interactive transaction handling", 2020-04-02, Git v2.27.0-rc0 -- merge listed in batch #5), interactively drive the transaction's state via a set of transactional verbs.
This interactivity is somewhat broken though: while the caller can use these verbs to drive the transaction's state, the status messages which confirm that a verb has been processed is not flushed.
The caller may thus be left hanging waiting for the acknowledgement.

Fix the bug by flushing stdout after writing the status update.

Kierakieran answered 2/5, 2020 at 18:54 Comment(4)
Dude, awesome answer!Alsatian
@Alsatian Thank you. And thanks to Patrick Steinhardt's great work on that.Kierakieran
To clarify If I have a git super project with with 2 git submodules compA/ and compB/ and I make a commit involving these 3 git repos, I am guess that involves 3 commits, commit-hash-compA, commit-hash-compB and a commit to the super project say commit-superprojject. And when pushing a pulling these needs to be pushed/pulled/merge together. So I push this changeset that involves these 3 commits and someone else trying to do something like this in their own git workspace, both these pull requests need to come together into the central repository or a 3rd repository atomically. Is that possible?Interpellation
@SarviShanmugham Not sure: you would need to test that scenario. The update-ref is very much a work in progress.Kierakieran
A
3

Also, take a look at: http://fabioz.github.io/mu-repo/ -- it's a tool such as mr and repo as I couldn't get those to work as I needed :)

Some notes:

  • it's done in Python (so, works well in any OS where Python runs: Linux, Win, Mac...)

  • besides running common git operations against many repos at once also provides workflows to:

    • clone multiple repositories
    • open urls from repos (so, it's possible to do create pull requests for multiple repos at once).
    • diff changes on multiple repos at once with winmerge or meld
    • execute non-git commands
Afforest answered 20/7, 2012 at 19:42 Comment(0)
T
0

Just as an idea. I haven't tried this by myself, but I came across this feature some time ago. Try to add all the wanted repositories as submodule to a new "root" repository and use the following command

git submodule foreach --recursive <command>

You may consult the help for this and how to deal with submodules. You can use whatever command you want. git and non-git commands.

Taber answered 25/8, 2017 at 14:21 Comment(0)
E
0

The Android repo tool manages the creation of feature branches across multiple repositories. However the repo upload command can only publish to Gerrit code review.

If you use an ordinary Git repository manager like GitHub or GitLab, repo upload will not work; you need the nonexistent repo push command. I implemented repo push in the Wave Computing fork of repo. We use it in production.

Elman answered 26/9, 2017 at 21:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.