Hg sub-repository dependencies

Asked 11/4, 2011 at 12:59 Answered 26/9, 2011 at 14:40

There have been a couple of questions about Hg sub-repo dependencies in the past (here and here) but the accepted answers don't seem to address the problem for me.

A project of mine has 4 dependencies: A, B, C, D. D is dependent on A, B and C; and B and C are dependent on A:

dependency graph of A,B,C,D

I want to use Hg sub-repositories to store them so I can track what version of each they rely on. This is because, while I am using A,B,C and D in this project, other projects will require just A and B. Therefore B and C must track what version of A they need independently of D. At the same time, in my application the versions of B and C referenced by a given version of D must always use the same version of A as that referenced by the given version of D (otherwise it will just fall over at runtime). What I really want is to allow them to reference each other as siblings in the same directory - i.e. D's .hgsub would look like the following, and B and C's would look like the first line.

..\A = https:(central kiln repo)\A
..\B = https:(central kiln repo)\B
..\C = https:(central kiln repo)\C

However this doesn't seem to work: I can see why (it'd be easy to give people enough rope to hang themselves with) but its a shame as I think its the neatest solution to my dependencies. I've read a few suggested solutions which I'll quickly outline and why they don't work for me:

Include copies as nested sub-directories, reference these as Hg sub-repositories. This yields the following directory structure (I've removed the primary copies of A, B, C, B\A, C\A as I can accept referencing the copies inside \D instead):
- project\ (all main project files)
- project\D
- project\D\A
- project\D\B
- project\D\B\A
- project\D\C
- project\D\C\A
Problems with this approach:
- I now have 3 copies of A on disk, all of which could have independent modifications which must be synced and merged before pushing to a central repo.
- I have to use other mechanisms to ensure that B, C and D are referencing the same version of A (e.g. D could use v1 while D\B could use v2)
A variation: use the above but specify the RHS of the .hgsub to point to a copy in the parent copy (i.e. B and C should have the .hgsub below):
```
A = ..\A
```
Problems with this approach:
- I still have three copies on disk
- The first time I clone B or C it will attempt to recursively pull the referenced version of A from "..\A", which may not exist, presumably causing an error. If it doesn't exist it gives no clue as to where the repo should be found.
- When I do a recursive push of changes, the changes in D\B\A do not go into the shared central repo; they just get pushed to D\A instead. So if I push twice in a row I can guarantee that all changes will have propagated correctly, but this is quite a fudge.
- Similarly if I do a (manual) recursive pull, I have to get the order right to get the latest changes (i.e. pull D\A before I pull D\B\A)
Use symlinks to point folder \D\B\A to D\A etc.

Problems with this approach:
- symlinks cannot be encoded in the Hg repo itself so every time a team member clones the repo, they have to manually/with a script re-create the symlinks. This may be acceptable but I'd prefer a better solution. Also (personal preference) I find symlinks highly unintuitive.

Are these the best available solutions? Is there a good reason why my initial .hgsub (see top) is a pipe-dream, or is there a way I can request/implement this change?

UPDATED to better explain the wider usage of A,B,C,D

Decoupage answered 11/4, 2011 at 12:59 Comment(0)

Instead of trying to manage your dependencies via Mercurial (or with any SCM for that matter), try using a dependency management tool instead, such as Apache Ivy.

Using an Ivy based approach, you don't have any sub-repos, you would just have projects A, B, C and D. A produces an artifact (e.g. a .jar, .so or .dll, etc), which is published into an artifact repository (basically a place where you keep your build artefacts) with a version. Projects B and C can then depend on a specific version of A (controlled via a ivy.xml file in each project) which Ivy will retrieve from the artifact repository. Projects B and C also produce artefacts that are published to your repository. Project D depends on B and C and Ivy can be told to retrieve the dependencies transitively, which means it will get the artifacts for B, C and A (because they depend on A).

A similar approach can be used with Apache Maven and Gradle (the later uses Ivy)

The main advantages are that:

it makes it very clear what versions of each component a project is using (sometimes people forget to check .hgsub, so they don't know they are working with subrepos),
it makes it impossible to change a dependant project (as you are working with artifacts, not code)
and it saves you from having to rebuild dependent projects and being unsure of what version you are using.
saves you from having multiple redundant copies of projects that are used by other projects.

EDIT: Similar answer with a slightly different spin at Best Practices for Project Feature Sub-Modules with Mercurial and Eclipse?

Compline answered 26/9, 2011 at 14:40 Comment(2)

+1 for distinguishing built artifacts from source. Trying to manage dependencies through SCM falls apart as soon as you trip over something that the SCM cannot control. – Theressa 19/10, 2011 at 19:53

This seems to me like the best approach, and the approach I would take if I were starting something from scratch. However our organization has some fairly poor separation of concerns between dependent projects, as well as some poorly defined interfaces to framework-type projects. This leads to lots of hopping between dependent to dependee to address a single requirement, and also to frequent refactoring (aided by Resharper). For these reasons, it has suited us better to use subrepos than, say, Nuget packages, for managing in-house dependencies. – Breckenridge 17/12, 2012 at 21:30

You say you want to track which version they each rely on but you'd also be happy with a single copy of A shared between B, C and D. These are mutually exclusive - with a single copy of A, any change to A will cause a change in the .hgsub of each of B, C and D, so there is no independence in the versioning (as all of B, C and D will commit after a change to A).

Having separate copies will be awkward too. If you make a change that affects both B's copy of A and C's copy then attempt to push the whole structure, the changes to (say) B will succeed but the changes to C will fail because they require merging with the changes you just pushed from B, to avoid creating new heads. And that will be a pain.

The way I would do this (and maybe there are better ways) would be to create a D repo with subrepos of A, B and C. Each of B and C would have some untracked A-location file (which you're prompted to enter via a post-clone hook), telling your build system where to look for its A repository. This has the advantage of working but you lose the convenience of a system which tracks concurrent versions of {B, C} and A. Again, you could do this manually with an A-version file in each of B or C updated by a hook, read from by a hook, and you could make that work, but I don't think it's possible using the subrepos implementation in hg. My suggestions really boil down to implementing a simplified subrepo system of your own.

Effect answered 12/4, 2011 at 23:50 Comment(2)

I've updated my question to better explain - B can (and will) be used independently of D, and so must track a version of A. Just because in my application I want B,C and D to use the same version of A (at a given point in time), doesn't mean that is the only way they will be used (otherwise I'd just give up on sub-repos and include the code directly). In the general case, they must be able to be independent; but in this project they must all use the same version of A. – Decoupage 13/4, 2011 at 8:59

That was how I understood the description originally, but if you want to use B and C in these two different ways, I don't think you can include the location of A in the repositories and still expect it to work. Have you considered having separate B2 and C2 repositories which have sub-repos of {B,C} and A, used for development (and hence tracking consistent versions of A)? Note that D would have subrepos of B and C still, not B2 and C2. Because what you're doing isn't directly possible with hg subrepos (as they stand), you're going to have to do something hacky at some point. – Effect 13/4, 2011 at 11:41

Recommended topics

Hot tags