Can git be made to mostly auto-merge XML order-insensitive files?

Asked 16/6, 2016 at 4:12 Answered 5/8, 2024 at 11:21

When merging Lightswitch branches, often colleagues will add properties to an entity I also modified, which will result in a merge conflict, as the new XML entries are added to the same positions in the lsml file.

I can effectively always solve these by accepting left and right in no particular order, so one goes above the other, as order isn't important in these particular instances. On the rare instances this isn't valid, this would produce an error in the project anyway, which I accept as a risk (but haven't encountered).

Is there a way (preferably for the file extension) to get git to automatically accept source and target changes for the same position and simply place one beneath the other?

Gravity answered 16/6, 2016 at 4:12 Comment(4)

Union merge almost works. See #1682262 and #36830644 – Axolotl 16/6, 2016 at 4:29

Thanks, I saw the answers about custom merge drivers, but was wondering if there was a more simple approach. The answer I guess I asked for was union merge, but that seems to suggest that disaster is the only likely outcome from using it. – Gravity 16/6, 2016 at 5:47

@Torek if you make your comment an answer, I'll accept it. – Gravity 16/6, 2016 at 23:25

Actually, I was hoping to maybe write a custom driver to do some kind of XML union merge. But I followed the links myself, found some papers on string-to-string edit algorithms for trees, and one actual Python implementation of an XML diff (Sylvian Thénault's xmldiff: pypi.python.org/pypi/xmldiff), and then lost the plot in the weeds. :-) I'll put that much in an answer though. – Axolotl 16/6, 2016 at 23:39

This gets pretty hard in general.

Some have attempted to use Git's union merge (which is more accessible now than it was in early days; as in that question, you just add merge=union in a .gitattributes file), but this does not work in general. It might work sometimes. Boiling it down a lot, it works if your XML is always structured so that naive line-oriented union merge produces valid XML (basically, keeping whole XML sub-elements all on one line), and you are always adding whole new XML sub-elements.

It is possible, in Git, to write a custom merge driver. Writing a useful one for XML is hard.

First we need an XML diff engine, such as Sylvain Thénault's xmldiff, to construct two string-to-string (or tree-to-tree) edits for three XML files (the merge base, local or --ours, and other or --theirs files: diff base-vs-local and base-vs-ours). This particular one looks like it works similarly to Python's difflib. (However, due to the referenced papers, it looks like it produces tree move / nesting-level operations as well as simple insert and delete. This is a natural and reasonable thing for a tree-to-tree edit algorithm to do, and probably actually desirable here.)

Then, given two such diffs, we need code to combine them. The union method is to ignore all deletions: simply add all additions to the base version (or, equivalently, add the "other" additions to the "local", or the "local" additions to the "other"). We could also combine tree insert/delete operations a la "real" (non-union-style) merges, and perhaps even declare conflicts. (And it might be nice to allow different handling of tree nesting-level-changes, driven by something vaguely like a DTD.)

These last parts are not, as far as I know anyway, done anywhere. Besides that, the Python xmldiff I linked here is a fairly big chunk of code (I have not read it anywhere near closely, nor attempted to install it, I just downloaded it and skimmed—it implements both a Myers-like algorithm, and the fancier "fast match / edit script" algorithm from the Stanford paper).

Axolotl answered 17/6, 2016 at 0:11 Comment(1)

It sounds like something EVERYONE needs, so probably already solved already. Does anyone know of a good open-source for this? – Igorot 1/8, 2018 at 6:11

To "auto-merge" XML, additional information is needed that is not present in the XML itself and not even in the XML schema.

This is the settings window of Oso XML Merge which can do that and gives an overview of what that information is:

For each XML element, we need to know:

Whether its children can be rearranged while searching for matches and during the merge
- For "dictionary" elements (representing heterogeneous data), they usually can, for "sequence" elements (representing homogenious data, including markup blocks), it can be either way depending on whether it represents an ordered or unordered sequence
How to find corresponding elements in a sequence
- In "data" XML, there's typically a "key" node which is supposed to represent a unique identifier. It can be either the element's text or a property (attribute or text) of some child element.
Whether whitespace and whitespace changes in text nodes are significant
- In "data" XML, leading/trailing whitespace usually isn't, in "markup" XML, it can be

As you can see, this information is roughy specific to a given XML schema so you need to somehow store and provide it to whatever merge logic you'll be using, for each XML schema encountered in your repository.

Yes, and some way to match these settings and actual files will be needed as well (as of this writing, Oso XML merge uses a file name pattern and the root tag name for that; if there are multiple matches, it asks the user).

Boehmenist answered 5/8, 2024 at 11:21 Comment(0)

Recommended topics

Hot tags