To hook or not to hook - git
Asked Answered
J

3

6

Our bespoke IDE outputs XML files with an encoding that makes them look like binary files. Diffs and merges of these files fail.

We can create ASCII versions of these files with the tr command. I would like to get to a state where these files are always automatically converted to ascii before they are committed.

I picked up my copy of Version Control with Git and it wholeheartedly warns me away from using hooks unless I really need to.

Should I be using a hook for this purpose? Or can I do something else to ensure the files are always converted before commit?

Windows XP with msysgit 1.7.4

--= update =--

Thanks everyone for your help and patience. Looking to this question I tried the following, but it does not work:

echo "*.xrp    filter=xrp" > .git/info/attributes
git config --global filter.xrp.clean 'tr -cd '\''\11\12\15\40-\176'\'''
git config --global filter.xrp.smudge cat
git checkout --force

The files remain unchanged after this config change. Even when I delete and re-checkout.

The tr command configured as the clean task does work in isolation. Proof:

$ head -n 1 cashflow/repo/C_GMM_CashflowRepo.xrp
ÿþ< ! - -   X M L   R e p o s i t o r y   f i l e   1 . 0   - - >

$ tr -cd '\''\11\12\15\40-\176'\' < cashflow/repo/C_GMM_CashflowRepo.xrp | head -n 1
<!-- XML Repository file 1.0 -->

Can anyone see what is wrong with my config?

Jovi answered 29/6, 2011 at 7:33 Comment(2)
How does a failed diff or merge manifest? In what way does a merge fail?Flag
Diff responds with: "binary files differ". Good question though, I am only assuming that the merge would fail as a consequence of being unable to diff. Regardless, having the ability to diff would be nice.Jovi
I
6

One issue with hooks is that they aren't distributed.

.gitattributes has some directive to manage the diff and content of a file, but another option would be an attribute filter (still in .gitattributes), and could automatically convert those files on commit.
(That is if the clean script is able to detect those files based on their content alone)


Per this chat discussion, the OP Synesso reports a success:

.gitattributes:
*.xrp filter=xrp

~/.gitconfig:
[filter "xrp"]
clean = \"C:/Program Files/Git/bin/tr.exe\" -cd "\\''\\11\\12\\15\\40-\\176'\\'"
smudge = cat

Then I had to modify the file, add, commit, delete, checkout ... and THEN it was fixed. :)

Note that, for any modification which doesn't concern just one user, but potentially any user cloning that repo, I prefer adding (and committing) an extra .gitattributes file in which the filter is declared, rather than modifying the .git/info/attribute file (which isn't cloned around).

From the gitattributes man page:

  • If you wish to affect only a single repository (i.e., to assign attributes to files that are particular to one user’s workflow for that repository), then attributes should be placed in the $GIT_DIR/info/attributes file.
  • Attributes which should be version-controlled and distributed to other repositories (i.e., attributes of interest to all users) should go into .gitattributes files.
  • Attributes that should affect all repositories for a single user should be placed in a file specified by the core.attributesfile configuration option.
  • Attributes for all users on a system should be placed in the $(prefix)/etc/gitattributes file.

http://git-scm.com/docs/gitattributes


phyatt adds in the comments:

I made an example similar to this for sqlite3.
You can add it into the correct files with two lines:

git config diff.sqlite3.textconv 'sqlite3 $1 .dump'
echo '*.db diff=sqlite3' >> $(git rev-parse --show-toplevel)/.gitattributes 

Similar lines can be used for writing other git config paths.

Infant answered 29/6, 2011 at 8:3 Comment(12)
Thanks. Attribute filters sound equally as interesting.Jovi
I found the attribute filter most useful. I eventually got to this question: #2317177 - I attempted a solution but it does not work. Question updated.Jovi
@Synesso: did you try to checkout again the all repository somewhere else?Infant
That confuses me. How would that even work? I've got necessary config in .git/info/attributes which can't be committed. If I checkout again that config will not be present.Jovi
@Synesso: simply copy the .git directory somewhere else, and then git checkout yourBranch ;)Infant
It failed to copy half dozen hardlinks. Checkout failed. I suspect I'm hitting windows issues.Jovi
@Synesso: in that case, clone it (git clone --no-hardlinks) and then recreate the local config. Except I realize you are modifying the .git/info/attribute file. I would rather put that setting in a .gitattributes file that I can: a/ commit, b/ distribute when cloning my repo.Infant
@Jovi let us continue this discussion in chatInfant
All great answers here, but @Infant went to great lengths to help out. Thanks!Jovi
@Synesso: you're welcome :) I took the liberty to include your conclusion in this answer, for others to see. And I detailed the difference between info/attributes and .gitattributes files.Infant
Thanks for posting this. I made an example similar to this for sqlite3. You can add it into the correct files with two lines: git config diff.sqlite3.textconv 'sqlite3 $1 .dump'; echo '*.db diff=sqlite3' >> $(git rev-parse --show-toplevel)/.gitattributes Similar lines can be used for writing other git config paths.Kabul
@Kabul Thank you for this feedback. I have included your comment in the answer for more visibility.Infant
O
2

Does diff stand a chance of working on them as is (i.e. they just contain a handful of strange bytes but are otherwise text) or not? If it does, you can just force git to treat them as text with .gitattributes. If not, it still might be better to create custom diff and merge scripts (that will use the tr as needed to convert) and tell git to use it, again with .gitattributes.

In either case you will not be using hooks (those are for running in particular operations), but .gitattributes, which are file-specific.

Outskirts answered 29/6, 2011 at 7:56 Comment(1)
The files are XML, but when I look at them in hexdump each byte is interspersed with 00. As a result they are treated as binary by diff. Your solution sounds very easy. I'll try it out tomorrow.Jovi
M
2

If your preferred editing format were ASCII and only your builds required the binary files I would recommend using build rules to generate the binary version from the preferred source which you would commit to the repository.

Given that your IDE makes the files in the binary format already, I think the best thing is to store them in the repository in that format.

Rather than hooks, look at git help attributes, especially diff and textconv which allow you to configure files matching certain patterns to use alternate means of diffing. You should be able to produce working ASCII diffs without having to compromise how you store the files or edit them.

EDIT: Based on your comment elsewhere that "every other byte is 0" that suggest the file is UTF-16 or UCS-2. See this answer for a diff which can handle unicode: Can I make git recognize a UTF-16 file as text?

Musket answered 29/6, 2011 at 7:56 Comment(4)
Thanks. I should have made it clear that the IDE is a fruitcake and writes the files as binary when they really are not. There's no benefit at all to their being binary. Thanks for your helpful answer!Jovi
+1 for generating the XML files on-demand and only committing the source document(s) that the XML gets generated from.Sibyls
@Jovi based on your comment I added a link to a related question.Musket
To be clear, the UTF-16 files are the source documents. The IDE will happily read them in again after conversion to 8 byte format.Jovi

© 2022 - 2024 — McMap. All rights reserved.