Printing out unicode from Java code issue in windows console
Asked Answered
C

4

7

I have got a problem with printing out a unicode symbol in the windows console.

Here's the java code that prints out the unicode symbol value;

System.out.print("\u22A2 ");

The problem doesn't exist when I run the program in Eclipse with encoding settings as UTF-8, however when it comes to windows console the symbol gets replaced by a question mark.

The following was done to try overcome this problem, with no success;

  • Change the font of windows console to Lucida Console.

  • Every time I run windows console I will change the encoding settings, i.e. with the use of chcp 65001

An extra step I've tried a few times was running the java file with an argument, i.e. java -Dfile.encoding=UTF-8 Filter (where "Filter" is name of the class)

Cranach answered 4/12, 2013 at 21:27 Comment(5)
are you sure the console's running in unicode? could be win-1252 or something.Circosta
I'm guessing you've already read this. #8669556Gynecocracy
I've got no idea of how would I check it, I've seen a screenshot of somebody's console where in his Options he would have information about what encoding he uses, however mine does not show it.Cranach
@GGrec nope I didn't since it's to do with input, I didn't came across itCranach
The MS C runtime doesn't support UTF-8; even if you chcp to 65001 in the console you will likely hit app-breaking bugs. There is no reliable way to get Unicode stdout to the Windows console. If you absolutely must, there is the Win32 API WriteConsoleW, but it obviously only works on Windows, it needs careful handling of detecting whether you're actually talking to the Windows console, some other console, or a file or pipe, and you can't call it in pure Java (you need JNA).Cuneiform
H
9

By default, the code-page using in the CMD of Windows is 437. You can test by run this command in the prompt:

C:\>chcp
Active code page: 437

And, this code-page prevent you from showing Unicode characters properly! You have to change code page to 65001 AND using -Dfile.encoding=UTF-8 for that purpose.

C:\>chcp 65001
Active code page: 65001
C:\>java -jar -Dfile.encoding=UTF-8 path/to/your/runnable/jar
Hat answered 30/10, 2015 at 10:58 Comment(0)
Q
6

In additions to the steps you have taken, you also need a PrintStream/PrintWriter that encodes the printed characters to UTF-8.

Unfortunately, Java designers have chosen to open the standard streams with the so called "default" encoding, which is almost always unusable*) under Windows. Hence, using System.out and System.err naively will make your program output appear differently, depending on where you run it. This is straight against the goal: compile once, run anywhere.

*) It will be some non standard "code page" nobody except Microsoft recognizes on this planet. And AFAIK, if for example you have a German keyboard and a "German" OEM Windows and you want to have date and time in your home time zone, there is just no way to say: But I want UTF-8 input/output in my CMD window. This is one reason why I have my dual Ubuntu booted most of the time, where it goes without saying that the terminal does UTF-8.

The following usually works for me in JDK7:

public static PrintWriter stdout = new PrintWriter(
    new OutputStreamWriter(System.out, StandardCharsets.UTF_8),
    true);

For ancient Java versions, I replace StandardCharsets.UTF_8 by Charset.forName("UTF-8")

Quinsy answered 4/12, 2013 at 22:4 Comment(4)
Thanks for your reply, I have given it a go with a PrintStream however it doesn't seem to solve the problem. Perhaps it's me doing something wrong but here's what I've done; PrintStream sysout = new PrintStream(System.out, true, "UTF-8"); sysout.print("\u22A2 "); Once again it works fine in Eclipse, but it doesn't in windows consoleCranach
Look at my edit, @Adrian. It should then print your string as ⊢Quinsy
I really appreciate your help, but I have got no idea why it is not working already.. I have followed your solutions very closely, tried both charSets and it isn't just working.. I do always try all the options i could think of but ehm.. I even compiled the java files with encoding, using javac -encoding utf8 *.javaCranach
Last time I tried this solution too many characters were printed on the console (Windows XP.) The best luck I've had is piping STDOUT through another application.Syck
H
0

I've struggled myself for a long time with the same problem, but I believe I have finally found an, if not very pretty, solution.

As far as I can tell, the problem actually consists of 2 issues:

  • The Windows console code page is incorrect by default
  • System.out uses an incorrect encoding by default

Adjust code page from Java

The first issue can be observed using cmd or powershell and running chcp:

Active code page: 850.

This should be 65001 for UTF-8, which can be set using chcp 65001. This only works though if you can run a command in the shell your program runs in, or if you edit the registry Autorun field (both aren't great options imo). And no, you can't run Runtime.getRuntime().exec("chcp.com 65001"), because that doesn't affect the calling console, but just the one created by running the command.

My suggestion is to use the native Windows function SetConsoleOutputCP() which means you can change the code page from within and isolated for your application. I simply used JNA, but it would probably be cleaner to write some native C wrapper so that you only get the one function:

Kernel32.INSTANCE.SetConsoleOutputCP(65001)

Change encoding of System.out

I found this issue when printing System.getProperties():

...
stdout.encoding=Cp1252
...

Different from the actual encoding (for me at least, which was 850), and not UTF-8 (mind you this was tested using Java 21, which aparently uses UTF-8 by default, but clearly not everywhere).

Again, this could probably be fixed by adding some startup parameter to set these properties, but you may as well create your own print stream:

new PrintStream(System.out, true, StandardCharsets.UTF_8)

which you may set as global System.out using System.setOut().

Putting it all together

This is my suggestion for fixing System.out and System.err in a platform-independent manner:

public static void fixSystemOutEncoding() {
    if(
        System.console() == null || // No interactive terminal connected (maybe you still want to do it there?)
        !System.getProperty("os.name").toLowerCase().contains("win") // Not on Windows
    ) {
        // No console or no Windows -> nothing to do
        return;
    }

    try {
        // Set console code page to 65001 = UTF-8
        if(Kernel32.INSTANCE.SetConsoleOutputCP(65001)) {
            // Replace System.out and System.err with PrintStreams using UTF-8
            System.setOut(new PrintStream(System.out, true, StandardCharsets.UTF_8));
            System.setErr(new PrintStream(System.err, true, StandardCharsets.UTF_8));
        }
        else {
            // SetConsoleOutputCP() failed, throw exception with error message,
            // handle it in catch (you may want to do something else here or
            // just ignore it)
            throw new RuntimeException(Kernel32Util.getLastErrorMessage());
        }
    } catch(Throwable t) {
        // Something went wrong, probably with the native library
        // Probably just ignore it and deal with UTF-8 not being available
    }
}

Requires JNA and JNA Platform (net.java.dev.jna:jna-platform).

Hann answered 2/7, 2024 at 18:54 Comment(0)
T
-2

For the Arabic language I used the following code:

PrintWriter stdout = new PrintWriter(
new OutputStreamWriter(System.out,StandardCharsets.ISO_8859_1),true);
Tahoe answered 4/4, 2015 at 15:52 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.