Changing the default encoding for String(byte[])
Asked Answered
T

3

10

Is there a way to change the encoding used by the String(byte[]) constructor ?

In my own code I use String(byte[],String) to specify the encoding but I am using an external library that I cannot change.

String src = "with accents: é à";
byte[] bytes = src.getBytes("UTF-8");
System.out.println("UTF-8 decoded: "+new String(bytes,"UTF-8"));
System.out.println("Default decoded: "+new String(bytes));

The output for this is :

UTF-8 decoded: with accents: é à
Default decoded: with accents: é à

I have tried changing the system property file.encoding but it does not work.

Topsoil answered 17/9, 2008 at 9:6 Comment(0)
K
7

You need to change the locale before launching the JVM; see:

Java, bug ID 4163515

Some places seem to imply you can do this by setting the file.encoding variable when launching the JVM, such as

java -Dfile.encoding=UTF-8 ...

...but I haven't tried this myself. The safest way is to set an environment variable in the operating system.

Kisangani answered 17/9, 2008 at 9:12 Comment(5)
Has anyone tried the -Dfile.encoding approach? It would be great to be able to do this in a platform-agnostic way.Contaminate
@MattPassell We use the following args when launching the JVM to ensure that we're specifying UTF-8 properly everywhere: -Dfile.encoding=ISO646-US -Dsun.jnu.encoding=ISO646-US and it appears to work fine.Kisangani
Thanks for the response. Am I missing something? I just Googled for ISO646-US and found out it's an official name for ASCII. How does that help make sure you're using UTF-8?Contaminate
@MattPassell it doesn't ensure, but it makes it blatantly obvious that we're not specifying the encoding explicitly during development since the character set is so limitedKisangani
thanks! For me, this solution worked by adding this JVM parameter when launching tomcat.Scurrile
R
1

I think you want this: System.setProperty("file.encoding", "UTF-8");

It solved some problems, but I still have another ones. The chars "í" and "Í" doesn't convert correctly if the SO is ISO-8859-1. Just with the JVM option on startup, I get it solved. Now just my Java Console in the NetBeans IDE is crashing charset when showing special chars.

Rupture answered 17/9, 2008 at 9:6 Comment(0)
T
1

Quoted from defaultCharset()

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system.

In most OSes you can set the charset using a environment variable.

Taskmaster answered 17/9, 2008 at 9:12 Comment(1)
Not really the answer I hoped for (I would have liked to be able to do it dynamically). Giving a sample of how to change the encoding for major OSes would be great. ThanksTopsoil

© 2022 - 2024 — McMap. All rights reserved.