I've been struggling with this exact problem for the last few days and have written a small .NET utility to extract and normalise Excel files in such a way that they're much easier to store in source control. I've published the executable here:
https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe
..and the source here:
https://bitbucket.org/htilabs/ooxmlunpack
If there's any interest I'm happy to make this more configurable, but at the moment, you should put the executable in a folder (e.g. the root of your source repository) and when you run it, it will:
- Scan the folder and its subfolders for any .xlsx and .xlsm files
- Take a copy of the file as *.orig
- Unzip each file and re-zip it with no compression
- Pretty-print any files in the archive which are valid XML
- Delete the calcchain.xml file from the archive (since it changes a lot and doesn't affect the content of the file)
- Inline any unformatted text values (otherwise these are kept in a lookup table which causes big changes in the internal XML if even a single cell is modified)
- Delete the values from any cells which contain formulas (since they can just be calculated when the sheet is next opened)
- Create a subfolder *.extracted, containing the extracted zip archive contents
Clearly not all of these things are necessary, but the end result is a spreadsheet file that will still open in Excel but which is much more amenable to diffing and incremental compression. Also, storing the extracted files as well makes it much more obvious in the version history what changes have been applied in each version.
If there's any appetite out there, I'm happy to make the tool more configurable since I guess not everyone will want the contents extracted, or possibly the values removed from formula cells, but these are both very useful to me at the moment.
In tests, a 2MB spreadsheet 'unpacks' to 21MB but then I was able to store five versions of it with small changes between each, in a 1.9MB mercurial data file, and visualise the differences between versions effectively using Beyond Compare in text mode.
git
has the hook behavior that will allow this, but I don't know about hg – Bereave