I'm posting this as a Q&A to document a workaround for a problem that seems to come up frequently—how to put MySQL Workbench files under version control—but for which I have been unable to find any solutions. Feedback is welcome!
How can I tell Mercurial to diff
the contents of a zipped archive and ignore some of the changes to those contents? Specifically, how can I use hg
to diff the contents of a MySQL Workbench (.mwb
) file, ignoring the many unimportant changes that MySQL Workbench makes every time the file is opened? Can I use a custom script that ignores certain irrelevant changes?
Background
I am trying to diff
a file in an hg repository. The file, document.mwb.xml
, is an XML document extracted from a .mwb
file (a MySQL Workbench model file). Basically, I am looking to keep the model's contents—the table structure, visual model, etc.—under version control, but not commit the .mwb
file itself, which is a zip archive and thus a binary file.
Anytime I save the .mwb
file, I unzip it. I keep the unzipped contents in my repository, and just zip them up again when I need to work with the .mwb
in MySQL.
The XML in question looks like this:
<?xml version="1.0"?>
<data grt_format="2.0" document_type="MySQL Workbench Model" version="1.4.4">
<value type="object" struct-name="workbench.Document" id="8551CCFA-3AD0-4207-BC76-15ED589CF22C" struct-checksum="0x7131bf99">
<value type="object" struct-name="workbench.logical.Model" id="B48E1CD2-3386-40B7-8E59-AA191598F667" struct-checksum="0xf4220370" key="logicalModel">
<value _ptr_="0x7fbcd1cc3270" type="list" content-type="object" content-struct-name="workbench.logical.Diagram" key="diagrams"/>
<value _ptr_="0x7fbcd1cc3210" type="dict" key="customData"/>
<value _ptr_="0x7fbcd1cc32d0" type="list" content-type="object" content-struct-name="model.Marker" key="markers"/>
<value _ptr_="0x7fbcd1cc3330" type="dict" key="options"/>
<value type="string" key="name"></value>
<link type="object" struct-name="GrtObject" key="owner">8551CCFA-3AD0-4207-BC76-15ED589CF22C</link>
</value>
<value _ptr_="0x7fbcd1cc2b70" type="list" content-type="object" content-struct-name="workbench.OverviewPanel" key="overviewPanels"/>
<value _ptr_="0x7fbcd1cc2c00" type="list" content-type="object" content-struct-name="workbench.physical.Model" key="physicalModels">
<value type="object" struct-name="workbench.physical.Model" id="34B9E967-5C9B-4D1B-8759-C417F6C33AA3" struct-checksum="0x5f896d18">
...
The problem is all of those _ptr_
attributes: there are literally thousands of them in this file, and every one of them changes every single time the file is saved, even if nothing is modified. As a result, the repository can rapidly get cluttered with completely meaningless "changes" to this file.
Is there a way to use a custom diff
routine to ignore these irrelevant changes?