Extended Ascii doesn't work in console!
Asked Answered
R

3

3

For example System.out.println("╚"); displays as a ?, same goes for System.out.println("\u255a");

Why doesn't this work? Stdout does indeed support these characters so I don't get it.

Reynold answered 24/2, 2011 at 4:7 Comment(9)
1) "Extended Ascii" is utterly meaningless. 2) That’s obviously not your code. 3) What OS/terminal?Ite
Command-prompt in windows. And Extended ascii is what it is called.Reynold
@jleedev tell OP how you really feel...Ideational
@DasWood by "Extended ASCII" are you referring to non-printing characters? Because, if that's the case, then yeah they won't display. Most consoles to my knowledge will only display in the 0-127 range but I've never really researched it so I could be wrong.Ideational
Since System.out.println(╚); isn't valid Java, please show the REAL code -- how did you generate the box-drawing character in code?Murderous
If I type it directly into the console it works but if I put it into a text file and 'type text.txt' I get garbage. I've seen this done before so formatting can be done. I wonder how.Reynold
@Jim Garrison that is what I have in my IDE, I tried that and \u255a, the box drawing set using character map set to DOS.Reynold
You can't have a naked unquoted character in System.out.println() -- please post the actual code.Murderous
From Wikipedia: "The use of the term is sometimes criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue."Closefisted
I
2

See this question. When Java’s default character encoding is not UTF-8 — as is the case, it seems, on Windows and OS X, but not Linux — then characters which fail to encode are converted to question marks. You can pass the correct switch (-Dfile.encoding=UTF-8 on some terminals, but I don’t have a Windows box in front of me) to the JVM’s command line, or you can set an environment variable. Portably determining what this should be might be impossible, but if you know that you will always run on the Win32 console, for example, you can choose a Charset to explicitly encode the characters before writing them to standard output, or you can directly write the bytes you need.

Ite answered 24/2, 2011 at 4:21 Comment(0)
L
2

The Windows command prompt uses old DOS OEM encodings by default. System.out uses the default system encoding, which will be a Windows "ANSI" encoding. However, System.console() detects the encoding of the console.

U+255A (╚) is more likely to be supported by the OEM codepages as these ranges were used for accented characters in Windows.

You can read more here, here, here and here.

Personally, I would avoid the -Dfile.encoding option with codepage 65001 as this produces unintended side-effects in both the console (batch files stop working) and Java (bugs).

Lucrecialucretia answered 24/2, 2011 at 8:45 Comment(0)
W
0

In case you are using Windows, the console is not UTF-8 but UTF-16 which is the same native encoding that Java uses, therefore you should be able to print wide character strings directly.

I'm not a Java programmer but in the case of C you have to call _setmode() with the special mode _O_U16TEXT before printing UTF-16 will actually work.

If you want to print multibyte character strings instead you can set the Windows console to UTF-8 from the commandline with chcp 65001 or programmatically from the Win32 API SetConsoleOutputCP() but beware a bug where WriteFile() returns the number of characters written instead of the number of bytes written as is documented. This bug causes UTF-8 on the Windows console to be corrupt from Perl, PHP and Ruby. I believe even MSVCRT even falls victim.

Good luck!

Wethington answered 24/2, 2011 at 7:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.