git recently has begun to understand encodings such as utf16.
See gitattributes docs, search for working-tree-encoding
[Make sure your man page matches since this is quite new!]
If (say) the file is UTF-16 without BOM on Windows machine then add to your .gitattributes
file
*.vmc text working-tree-encoding=UTF-16LE eol=CRLF
If UTF-16 (with bom) on *nix make it:
*.vmc text working-tree-encoding=UTF-16-BOM eol=LF
(Replace *.vmc
with *.whatever
for whatever
type files you need to handle)
See: Support working-tree-encoding "UTF-16LE-BOM".
Added later
Following @Hackslash, one may find that this is insufficient
*.vmc text working-tree...
To get nice text-diffs you need
*.vmc diff working-tree...
Putting both works as well
*.vmc text diff working-tree...
But it's arguably
- Redundant —
eol=...
implies text
- Verbose — a large project could easily have dozens of different text file types
The Problem
Git has a macro-attribute binary
which means -text -diff
. The opposite +text +diff
is not available built-in but git gives the tools (I think!) for synthesizing it
The solution
Git allows one to define new macro attributes.
I'd propose that top of the .gitattributes
file you have
[attr]textfile text diff
Then for all paths that need to be text and diff do
path textfile working-tree-encoding= eol=...
Note that in most cases we would want the default encoding (utf-8) and default eol (native) and so may be dropped.
Most lines should look like
*.c textfile
*.py textfile
Etc
Why not just use diff?
Practical: In most cases we want native eol. Which means no eol=...
. So text
won't get implied and needs to be put explicitly.
Conceptual: Text Vs binary is the fundamental distinction. eol, encoding, diff etc are just some aspects of it.
Disclaimer
Due to the bizarre times we are living in I don't have a machine with a current working git. So I'm unable at the moment to check the latest addition. If someone finds something wrong, I'll emend/remove.