Checkin changes to UTF8 BOM using git
Asked Answered
G

3

3

I accidentally checked in a utf8 encoded text file from Windows without removing the BOM before. Now I tried to remove it in a later version and check-in this change again. It seems as git ignores the change to the BOM bytes. Is there a setting to make git let me check-in the file like it is? (I know there is a similar issue when it comes to line endings - and there is a setting for this one...)

Giantism answered 20/6, 2011 at 20:9 Comment(1)
git leaves BOM characters well alone. Are you sure that you're removing the characters that you think you are? How are you doing this and what makes you think that git is ignoring the change?Greenshank
U
3

If you can make this reproducible, by all means report a bug

Here's my two cents:

xxd -r > raw <<< "0000000: 4865 c582 c397 c3b8 0a                   He......."
cat raw # shows "Heł×ø" in UTF8 terminals

git init .
iconv -t UTF32BE raw  > test
git commit -am nobom test
iconv -t UTF32 raw  > test
git diff # reports: "Binary files a/test and b/test differ"
git commit -am bom test

Verify different objects present:

git rev-list --objects --all
1d0cf0c1871a8743f947bd4582198db4fc1e72b1
c52c2a8c211a0031e01eef5d5121d5d0b4aabc40
4740254f8f52094afc131040afc80bb68265e78c 
fd3c513224525b3ab94a2512cbbfa918793640eb test
2d9da153c5febf0425437395227381d3a4784154 
2e54d36463fee81e89423d7d80ccc5d7003aba21 test

or, slightly more direct

for h in $(git rev-list --all -- test); do git ls-tree $a; done
100644 blob 2e54d36463fee81e89423d7d80ccc5d7003aba21    test
100644 blob 2e54d36463fee81e89423d7d80ccc5d7003aba21    test

This is with git 1.7.4.1 on ubuntu 64 bit


xxd test # no bom:
0000000: 0000 0048 0000 0065 0000 0142 0000 00d7  ...H...e...B....
0000010: 0000 00f8 0000 000a                      ........

xxd test # with bom
0000000: fffe 0000 4800 0000 6500 0000 4201 0000  ....H...e...B...
0000010: d700 0000 f800 0000 0a00 0000            ............

Ulrike answered 20/6, 2011 at 23:12 Comment(1)
Many thanks for presenting me how to figure it out. Indeed git does not change the BOM characters. It was a problem with my editor on the windown machine. -- Thank you!Giantism
V
2

git does not ignore Byte Order Mark (BOM) sequence and it is possible to git commit a BOM removal only. Tested with xml UTF-8

Removing BOM on Windows in Visual Studio 2017 through File->Save As->Save with Encoding->Unicode (UTF-8 without signature). git sees a change and it can be committed

Vadose answered 24/9, 2019 at 17:36 Comment(0)
D
-1

If you can't find a proper solution, you can always add a character to file, commit, remove the BOM and the letter, and amend the commit.

Dennett answered 20/6, 2011 at 20:17 Comment(2)
? What. Let's first establish whether the problem existsUlrike
@Ulrike You are correct. With git v1.7.4 I was able to commit a UTF-8 encoded file with only BOM removed.Dennett

© 2022 - 2024 — McMap. All rights reserved.