Character encoding with msysgit
Asked Answered
R

2

9

Commit messages created on my winXP box generate warnings when read on my Win7 box.

My name contains special characters (ö), I suppose that this is the source of the problem since my name is in the commit. I saw the problem while trying to stash changes on a commit created on winXP: Warning: commit message does not conform to UTF-8.

I would like to check what encoding was used to generate the commit on winXP, but can't find how.

$ git config --get i18n.commitencoding returns blank on both machines.

http://www.kernel.org/pub/software/scm/git/docs/git-commit.html seems to say that git checks the encoding in the commit objects.

git log, git show, git blame and friends look at the encoding header of a commit object, and try to re-code the log message into UTF-8 unless otherwise specified.

That is fine, but then why does git complain on win7 and not on winXP?


msysgit versions are identical on both machines: 1.7.4.msysgit.0.

Renn answered 23/6, 2011 at 9:49 Comment(1)
By which soft do you generated the file name with special char "o:"? With msys or with navive windows apps?Pardew
W
0

Just a wild guess but I had a similar problem with letters in someone's name in a Rakefile recently and I actually had to change the encoding of my CMD environment to run it.

Look at step number two on this wiki:

https://github.com/NancyFx/Nancy/wiki/Having-trouble-with-rake%3F

The Microsoft documentation on the chcp command is here: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx?mfr=true

Whoa answered 15/11, 2011 at 23:13 Comment(0)
T
0

Using i18n.commitEncoding is better supported with modern Git (2019), but only Git 2.25 (Q1 2020), provides a full support: Handling of commit objects that use non UTF-8 encoding during "rebase -i" has been improved.

See commit 52f52e5, commit 5772b0c (11 Nov 2019), commit b375744, commit 019a9d8, commit 0798d16, commit e4b95b3, commit 1ba6e7a (08 Nov 2019), and commit 99b2ba3 (07 Nov 2019) by Doan Tran Cong Danh (congdanhqx-zz).
(Merged by Junio C Hamano -- gitster -- in commit 6511cb3, 01 Dec 2019)

sequencer: reencode old merge-commit message

Signed-off-by: Doan Tran Cong Danh

During rebasing, old merge's message (encoded in old encoding) will be used as message for new merge commit (created by rebase).

In case of the value of i18n.commitencoding has been changed after the old merge time. We will receive an unusable message for this new merge.

Correct it.


sequencer: reencode to utf-8 before arrange rebase's todo list

Signed-off-by: Doan Tran Cong Danh

On musl libc, ISO-2022-JP encoder is too eager to switch back to 1 byte encoding, musl's iconv always switch back after every combining character.
Comparing glibc and musl's output for this command

$ sed q t/t3900/ISO-2022-JP.txt| iconv -f ISO-2022-JP -t utf-8 `|`
        iconv -f utf-8 -t ISO-2022-JP | xxd

glibc: 
00000000: 1b24 4224 4f24 6c24 5224 5b24 551b 2842  .$B$O$l$R$[$U.(B
00000010: 0a                                       .

musl: 
00000000: 1b24 4224 4f1b 2842 1b24 4224 6c1b 2842  .$B$O.(B.$B$l.(B
00000010: 1b24 4224 521b 2842 1b24 4224 5b1b 2842  .$B$R.(B.$B$[.(B
00000020: 1b24 4224 551b 2842 0a                   .$B$U.(B.

Although musl iconv's output isn't optimal, it's still correct.

From commit 7d509878b8 ("pretty.c: format string with truncate respects logOutputEncoding", 2014-05-21, Git v2.1.0-rc0 -- merge listed in batch #3), we're encoding the message to utf-8 first, then format it and convert the message to the actual output encoding on git commit --squash.

Thus, t3900::test_commit_autosquash_flags is failing on musl libc.

Reencode to utf-8 before arranging rebase's todo list.


configure.ac: define ICONV_OMITS_BOM if necessary

Signed-off-by: Doan Tran Cong Danh

From commit 79444c9294 ("utf8: handle systems that don't write BOM for UTF-16", 2019-02-12, Git v2.21.0-rc1 -- merge listed in batch #0), we're supporting those systems with iconv that omits BOM with:

make ICONV_OMITS_BOM=Yes

However, configure script wasn't taught to detect those systems.

Teach configure to do so.

Titillate answered 9/12, 2019 at 7:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.