Git says "Binary files a... and b... differ" on for *.reg files
Asked Answered
A

6

17

Is there a way to force Git in to treating .reg files as text? I am using Git to track my windows registry tweaks and Windows uses .reg for these files.

UPDATE 1: I got it to run a diff (thanks, Andrew). However, now it looks like this below. Is this an encoding issue?

index 0080fe3..fc51807 100644
--- a/Install On Rebuild/4. Registry Tweaks.reg
+++ b/Install On Rebuild/4. Registry Tweaks.reg
@@ -1,49 +1,48 @@
-<FF><FE>W^@i^@n^@d^@o^@w^@s^@ ^@R^@e^@g^@i^@s^@t^@r^@y^@ ^@E^@d^@i^@t^@o^@r^@
-^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;
-^@^M^@
...

Any ideas?

UPDATE 2: Thanks all who helped: here's what I did in the end: create file .gitattributes with content *.reg text diff and then I converted the files to UTF-8 as UTF-16 is weird with diffs. I'm not using any foreign characters so UTF-8 works for me.

Anodic answered 18/2, 2011 at 20:10 Comment(8)
That's UTF-16 encoded (probably, could be UCS-2, but I think the BOM is only used for UTF)Tinner
Possibly a duplicate of https://mcmap.net/q/128568/-can-i-make-git-recognize-a-utf-16-file-as-text/166955Proviso
UTF-8 can handle the same "foreign characters" as UTF-16 :-)Wiggs
For people who don't like the idea of having to convert their files: a better way to view diffs is to install KDiff3 (or another diff tool), configure git to use it and use git difftool file.Latchet
Be careful converting REG files to UTF-8, My machine (Windows XP) could not merge a REG file after conversionLintel
Oh, and by merge I meant import into the registryLintel
Thanks, @PatrickMcDonald. Definitely worth noting here. However, I haven't been on XP in years ;)Anodic
@BenVoigt It's UCS-2 LE with BOM. Used for exports by some versions of regedit, no clue why.Pashm
A
4

Quick Answer

As others have pointed out, this issue is caused by an encoding mix up. You have two options:

  • Change the file encoding to UTF-8 by re-saving it accordingly.

  • Create a .gitattributes file, and include the following:

    *.reg working-tree-encoding=UTF-16LE-BOM eol=CRLF

Cause

By default, registry exports from the Windows Registry Editor are saved in a particular UTF-16 encoding. Under the hood, Git only supports UTF-8 and its supersets, so when Git sees a UTF-16 encoded file, it sees a lot of unexpected non-character bytes and interprets that as a binary file.

Asking Git to treat the file as text by setting a *.reg diff attribute doesn't work because Git is still expecting the wrong encoding. That's why you saw all of those ^@ characters.

Solutions

One solution that others have suggested is to save the UTF-16 files as UTF-8 and that totally works! It does have one big disadvantage though: if you have a lot of .reg files, or you want to re-export a key from the Registry Editor, you'll have to re-save it with the correct encoding every time.

Alternatively, you can tell Git what encoding you plan to use with the working-tree-encoding attribute. When this is specified, Git will convert a text file to UTF-8 as it is committed to the repository, and then convert it back to the original encoding as it gets checked out. That way, the file always has the original encoding when it appears in your working directory. If you're familiar with end-of-line normalization, the behavior is similar to that.

If you take this route, there are a few pitfalls to be aware of:

  1. The attribute is relatively new (March 2018), so if you're supporting wide Git implementations or versions, it could cause trouble.
  2. If you're going beyond small UTF-16 files, encoding conversion could slow things down or, depending on the encoding, not make the round-trip unscathed.

For these reasons, the documentation recommends to only use this attribute if the file cannot be stored usefully as UTF-8, but depending on your use case these pitfalls may not concern you. Finally, when using this attribute it's important to also specify what end-of-line characters are in use to avoid ambiguity. That's done with the eol attribute.

Putting it all together, I recommend you try creating a .gitattributes file in your repository's root, and including the following line:

*.reg working-tree-encoding=UTF-16LE-BOM eol=CRLF

Anselmo answered 26/8, 2021 at 3:47 Comment(4)
This is excellent. Thank you for pointing out the new working-tree-encoding attribute.Anodic
Hey, thanks so much! I was lucky to see the attribute. Thanks for pointing out that the attribute is new, too. It led me to add some of the pitfalls listed in the documentation to my answer.Anselmo
working-tree-encoding=UTF-16LE-BOM works great, but for some reason eol=CRLF is being ignored. Instead the reg files are still being checked out is LF. I also get a warning about CRLF will be replaced by LF the next time Git touches it on adding a reg file.Duwalt
It seems to work now with *.reg text eol=crlf working-tree-encoding=UTF-16LE-BOMDuwalt
C
12

To tell git to explicitly diff a filetype, put the following in a .gitattributes file in your repository’s root directory:

*.reg diff
Castro answered 18/2, 2011 at 20:14 Comment(3)
This isn't working for me. I created a file .gitattributes and put that code in exactly.Anodic
I needed to change it to *.reg diff (no minus). But now it looks all weird. I updated the question.Anodic
@nathanchere Updated, seemed to have missed that comment 4 years back.Castro
T
4

Git is treating your registry export files as binary files because they have NULs. There is no good way to diff or merge general binary files. A change of one byte can change the interpretation of the rest of the file.

There are two general approaches to handling binary files:

  1. Accept that they're binary. Diffs aren't going to be meaningful, so don't ask for them. Don't ever merge them, which means only allowing changes on one branch. In this case, this can be made easier by putting each tweak (or set of related tweaks in a separate file, so there's fewer possible ways differences will happen in one file.

  2. Store the changes as text, and convert/deconvert to these binary forms.

Even though these "text" files, the UTF-16 encoding contains NULs. There appear to be no non-ASCII bits however. Can you convert them to ASCII (or UTF-8, which will be ASCII if there are no extended characters)?

Thomasthomasa answered 18/2, 2011 at 22:20 Comment(2)
They are actually registry exports. I'm not backing up the whole registry. I edit these files in Vim and they are not binary. Here is the file in question: gist.github.com/834529Anodic
@sirlancelot: Oh, right. They're UTF-16 encoded. Convert them to and from UTF-8 then?Thomasthomasa
A
4

Quick Answer

As others have pointed out, this issue is caused by an encoding mix up. You have two options:

  • Change the file encoding to UTF-8 by re-saving it accordingly.

  • Create a .gitattributes file, and include the following:

    *.reg working-tree-encoding=UTF-16LE-BOM eol=CRLF

Cause

By default, registry exports from the Windows Registry Editor are saved in a particular UTF-16 encoding. Under the hood, Git only supports UTF-8 and its supersets, so when Git sees a UTF-16 encoded file, it sees a lot of unexpected non-character bytes and interprets that as a binary file.

Asking Git to treat the file as text by setting a *.reg diff attribute doesn't work because Git is still expecting the wrong encoding. That's why you saw all of those ^@ characters.

Solutions

One solution that others have suggested is to save the UTF-16 files as UTF-8 and that totally works! It does have one big disadvantage though: if you have a lot of .reg files, or you want to re-export a key from the Registry Editor, you'll have to re-save it with the correct encoding every time.

Alternatively, you can tell Git what encoding you plan to use with the working-tree-encoding attribute. When this is specified, Git will convert a text file to UTF-8 as it is committed to the repository, and then convert it back to the original encoding as it gets checked out. That way, the file always has the original encoding when it appears in your working directory. If you're familiar with end-of-line normalization, the behavior is similar to that.

If you take this route, there are a few pitfalls to be aware of:

  1. The attribute is relatively new (March 2018), so if you're supporting wide Git implementations or versions, it could cause trouble.
  2. If you're going beyond small UTF-16 files, encoding conversion could slow things down or, depending on the encoding, not make the round-trip unscathed.

For these reasons, the documentation recommends to only use this attribute if the file cannot be stored usefully as UTF-8, but depending on your use case these pitfalls may not concern you. Finally, when using this attribute it's important to also specify what end-of-line characters are in use to avoid ambiguity. That's done with the eol attribute.

Putting it all together, I recommend you try creating a .gitattributes file in your repository's root, and including the following line:

*.reg working-tree-encoding=UTF-16LE-BOM eol=CRLF

Anselmo answered 26/8, 2021 at 3:47 Comment(4)
This is excellent. Thank you for pointing out the new working-tree-encoding attribute.Anodic
Hey, thanks so much! I was lucky to see the attribute. Thanks for pointing out that the attribute is new, too. It led me to add some of the pitfalls listed in the documentation to my answer.Anselmo
working-tree-encoding=UTF-16LE-BOM works great, but for some reason eol=CRLF is being ignored. Instead the reg files are still being checked out is LF. I also get a warning about CRLF will be replaced by LF the next time Git touches it on adding a reg file.Duwalt
It seems to work now with *.reg text eol=crlf working-tree-encoding=UTF-16LE-BOMDuwalt
K
3

Create one utf16toascii.py:

#!/usr/bin/env python3
import sys
data = open(sys.argv[-1]).read()
ascii = data.decode('utf-16').encode('ascii', 'replace')
sys.stdout.write(ascii)

Then in bash do:

$ echo "*.reg diff=utf16strings" >> .gitattributes
$ git config --global diff.utf16strings.textconv /path/to/utf16toascii.py

And you're good to diff registry files, as well as Xcode .strings files, or any other utf-16 file.

Katzen answered 12/2, 2014 at 19:8 Comment(1)
This doesn't work on Windows and should be called with Python 2, not 3.Autobahn
M
2

Convert .reg files from utf16 to utf8 by opening each .reg file in notepad and saving as Encoding UTF-8.

Mendoza answered 26/2, 2013 at 15:58 Comment(2)
It's been a while but IIRC, I tried this and the file would no longer import for me.Anodic
If you saved it as a different file, the old file should work as before. My method allows diff of the new file(s) after conversion, and was not meant to be importable.Mendoza
S
1

Manually assign iconv for diffing

Another answer suggested trying *.reg working-tree-encoding=UTF-16LE-BOM eol=CRLF

This did not work for me. I'm on Windows 10 and TortoiseGit didn't realize that the file was actually unchanged.

I have a bat file that dumps Tomcat registry to disk and after running that the TortoiseGit icon would always be red. But I would immediately turn green if I ran git status on the command line. -- Not sure what's going on there.

So I wound up with something else. I don't touch the internal encoding, I just manually define a diffing program to use for UTF-16 files and then I manually assign .reg files to use that. This works on git-bash-for-windows for me. And I don't have the problem in Windows Explorer with the red "this-has-changed" TortoiseGit icon overlay.

There are two steps:

  1. Define diffing program:

    $ git config --global diff.utf16file.textconv "iconv --from-code=UTF-16 --to-code=UTF-8"
    
  2. Assign .reg files to use that:

    $ cat .gitattributes
    *.reg diff=utf16file
    

More details below.


I have a batch file which does this:

del ApacheSoftwareFoundation.Wow6432Node.reg
reg export "HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Apache Software Foundation" ApacheSoftwareFoundation.Wow6432Node.reg /y

Now I ran this export batch file, then I changed the "timeout" value from 60 seconds to 66 seconds via Tomcat9 GUI and then I ran the export batch file again.

Before: you get no textual diff. You just get "Binary files ... differ"

$ git config --unset --global diff.utf16file.textconv
✘

$ git config --global diff.utf16file.textconv
✘

$ git check-attr --all ApacheSoftwareFoundation.Wow6432Node.reg
✔

$ git diff ApacheSoftwareFoundation.Wow6432Node.reg
diff --git a/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg b/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg
index 8c860ae..ad840d5 100644
Binary files a/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg and b/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg differ
✔

$ git diff ApacheSoftwareFoundation.Wow6432Node.reg
diff --git a/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg b/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg
index 8c860ae..ad840d5 100644
Binary files a/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg and b/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg differ
✔

After: you get an actual diff.

$ echo '*.reg diff=utf16file' >> .gitattributes
✔

$ git config --global diff.utf16file.textconv "iconv --from-code=UTF-16 --to-code=UTF-8"
✔

$ git config --global diff.utf16file.textconv
iconv --from-code=UTF-16 --to-code=UTF-8
✔

$ git check-attr --all ApacheSoftwareFoundation.Wow6432Node.reg
ApacheSoftwareFoundation.Wow6432Node.reg: diff: utf16file
✔

$ git diff ApacheSoftwareFoundation.Wow6432Node.reg
diff --git a/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg b/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg
index 8c860ae..ad840d5 100644
--- a/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg
+++ b/Tomcat9/bin/ApacheSoftwareFoundation.Wow6432Node.reg
@@ -202,5 +202,5 @@ Windows Registry Editor Version 5.00
 "Class"="org.apache.catalina.startup.Bootstrap"
 "Params"=hex(7):73,00,74,00,6f,00,70,00,00,00,00,00,00,00
 "Mode"="jvm"
-"Timeout"=dword:0000003c
+"Timeout"=dword:00000042
✔
Synchronism answered 16/8, 2023 at 11:31 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.