How to do proper Unicode and ANSI output redirection on cmd.exe?
Asked Answered
V

2

9

If you are doing automation on windows and you are redirecting the output of different commands (internal cmd.exe or external, you'll discover that your log files contains combined Unicode and ANSI output (meaning that they are invalid and will not load well in viewers/editors).

Is it is possible to make cmd.exe work with UTF-8? This question is not about display, s about stdin/stdout/stderr redirection and Unicode.

I am looking for a solution that would allow you to:

  • redirect the output of the internal commands to a file using UTF-8
  • redirect output of external commands supporting Unicode to the files but encoded as UTF-8.

If it is impossible to obtain this kind of consistence using batch files, is there another way of solving this problem, like using python scripting for this? In this case, I would like to know if it is possible to do the Unicode detection alone (user using the scripting should not remember if the called tools will output Unicode or not, it will just expect to convert the output to UTF-8.

For simplicity we'll assume that if the tool output is not-Unicode it will be considered as UTF-8 (no codepage conversion).

Vegetable answered 24/4, 2010 at 20:56 Comment(0)
T
10

You can use chcp to change the active code page. This will be used for redirecting text as well:

chcp 65001

Keep in mind, though, that this will have no effect if cmd was started with the /u switch which forces Unicode (UTF-16 in this case) redirection output. If that switch is active then all output will be in UTF-16LE, regardless of the codepage set with chcp.

Also note that the console will be unusable for interactive output when set to Raster Fonts. I'm getting fun error messages in that case:

C:\Users\Johannes Rössel\Documents>x
Active code page: 65001

The system cannot write to the specified device.

So either use a sane setup (TrueType font for the console) or don't pull this stunt when using the console interactively and having a path that contains non-ASCII characters.

Tadzhik answered 25/4, 2010 at 14:37 Comment(5)
If you research a little more, you will find the the UTF-8 codepage is not supported on Windows, on any version. So chcp 65001 makes no sense.Vegetable
@Sorin: It does work, but neither reliably nor supported. If you have UTF-encoded batch files to run (without the BOM) you can do so with this.Tadzhik
There is a major bug in using UTF-8 as ANSI codepage in that the WriteFile() API returns the number of codepoints written instead of the number of bytes written which is what is documented. This API is ultimately called by most C library functions such as printf() and by most scripting languages including Perl, PHP, and Ruby. Any code which check that a write was successful by comparing the number of bytes sent and the number of bytes returned will fail. Code which uses the number returned to move the output cursor will result in garbled text when printing non ASCII text.Topflight
Like Joey said, it’s inconsistent and unreliable. For example, I was just trying to redirect the output of a PowerShell script (run from cmd) to a text-file. It kept outputting ANSI, so any non-ASCII characters were incorrect. I used chcp 65001 to change the codepage and bingo! the text file then contains the correct Unicode characters. However, when I created a UTF-8 batch file with something as simple as echo ‽, it displayed incorrectly with codepage 437 and not at all with 65001. Redirecting to a file caused no output with codepage 65001 and correct output with codepage 437. ಠ_ఠPhotoemission
@Synetech: The correct output for CP437 is to be expected. CMD just writes the exact same bytes that are in the batch file in that case. It doesn't care about the fact that it's actually interpretable as UTF-8.Tadzhik
L
0
binmode(STDOUT, ":unix");

without

use encoding 'utf8';

Helped me. With that i had wide character in print warning.

Larder answered 4/6, 2013 at 0:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.