Where to get "UTF-8" string literal in Java?
Asked Answered
C

11

578

I'm trying to use a constant instead of a string literal in this piece of code:

new InputStreamReader(new FileInputStream(file), "UTF-8")

"UTF-8" appears in the code rather often, and would be much better to refer to some static final variable instead. Do you know where I can find such a variable in JDK?

BTW, on a second thought, such constants are bad design: Public Static Literals ... Are Not a Solution for Data Duplication

Clarion answered 14/7, 2011 at 18:45 Comment(3)
See this question.Sander
Note: if you are already on Java 7, use Files.newBufferedWriter(Path path, Charset cs) from NIO.Alchemist
That's some really bad advice from your link. He wants you to make a wrapper class for every possible string constant you might use?Jarodjarosite
K
985

In Java 1.7+, java.nio.charset.StandardCharsets defines constants for Charset including UTF_8.

import java.nio.charset.StandardCharsets;

...

StandardCharsets.UTF_8.name();

For Android: minSdk 19

Kwang answered 17/4, 2013 at 18:1 Comment(8)
do you use .toString() on that?Stoic
.toString() will work but the proper function is .name(). 99.9% toString is not the answer.Kwang
btw .displayName() will also work unless it is overridden for localization as intended.Kwang
You don't really need to call name() at all. You can directly pass the Charset object into the InputStreamReader constructor.Brussels
Note that in Andorid, this require API level 19+.Beals
And there are other libs out there which do require a String, perhaps because of legacy reasons. In such cases, I keep a Charset object around, typically derived from StandardCharsets, and use name() if needed.Throstle
The result for name(), toString() and just putting StandardCharsets.UTF-8 directly is all the same because Charset.toString() just calls Charset.name() and if you use StandardCharsets.UTF-8 in a place where a String is expected Charset.toString() will be called automatically.Claycomb
When I used only toString(), on Android6 it returned "java.nio.charset.CharsetICU[UTF-8]", definitely not what I wanted. Using .name() returned the correct value "UTF-8"Jardiniere
C
147

Now I use org.apache.commons.lang3.CharEncoding.UTF_8 constant from commons-lang.

Clarion answered 26/4, 2012 at 8:4 Comment(4)
For those using Lang 3.0: org.apache.commons.lang3.CharEncoding.UTF_8. (Note "lang3").Mofette
If you're using Java 1.7, see @Roger's answer below since it's part of the standard library.Dunleavy
P.S. "@Roger's answer below" is now @Roger's answer above. ☝Lakieshalakin
That class is deprecated since Java 7 introduce java.nio.charset.StandardCharsetsPhosphor
G
73

The Google Guava library (which I'd highly recommend anyway, if you're doing work in Java) has a Charsets class with static fields like Charsets.UTF_8, Charsets.UTF_16, etc.

Since Java 7 you should just use java.nio.charset.StandardCharsets instead for comparable constants.

Note that these constants aren't strings, they're actual Charset instances. All standard APIs that take a charset name also have an overload that take a Charset object which you should use instead.

Gendron answered 14/7, 2011 at 18:52 Comment(8)
So, should be Charsets.UTF_8.name()?Balcom
@kilaka Yeah use name() instead of getDisplayName() since name() is final and getDisplayName() is notAnneal
Bad idea to use third party code that's constantly modified, breaking backwards compatibility, to accomplish something you can do with the standard SDK.Colter
@Buffalo: Please read my answer again: it recommends using java.nio.charset.StandardCharsets when possible, which is not third party code. Additionally, the Guava Charsets definitions are not "constantly modified" and AFAIK have never broken backwards compatibility, so I don't think your criticism is warranted.Gendron
We've had multiple issues when upgrading the Guava libraries.Colter
@Buffalo: That's as it may be, but I doubt your issues had anything to do with the Charsets class. If you want to complain about Guava, that's fine, but this is not the place for those complaints.Gendron
Please do not include a multi-megabyte library to get one string constant.Hildy
"All standard APIs that take a charset name also have an overload that take a Charset object" is not quite true. One example is java.net.URLEncoder.encode(String, String), which does not have an overload taking a Charset parameter.Wilmot
B
51

In case this page comes up in someones web search, as of Java 1.7 you can now use java.nio.charset.StandardCharsets to get access to constant definitions of standard charsets.

Boxboard answered 17/4, 2013 at 4:56 Comment(4)
I have been trying to use this but it does not seem to work. 'Charset.defaultCharset());' seems to work after including 'java.nio.charset.*' but I can't seem to explicitly refer to UTF8 when I am trying to use 'File.readAllLines'.Kwang
@Kwang What seems to be the problem? From what I can see you can just call: Files.readAllLines(Paths.get("path-to-some-file"), StandardCharsets.UTF_8);Boxboard
I don't know what the problem was, but it worked for me after changing something which I can't remember.Kwang
^^^ You probably had to change the target platform in the IDE. If 1.6 was your latest JDK when you installed the IDE, it probably picked it as the default & kept it as the default long after you'd updated both the IDE and JDK themselves in-place.Innermost
W
10

This constant is available (among others as: UTF-16, US-ASCII, etc.) in the class org.apache.commons.codec.CharEncoding as well.

Welles answered 10/1, 2013 at 22:33 Comment(0)
B
9

There are none (at least in the standard Java library). Character sets vary from platform to platform so there isn't a standard list of them in Java.

There are some 3rd party libraries which contain these constants though. One of these is Guava (Google core libraries): http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Charsets.html

Bathrobe answered 14/7, 2011 at 18:51 Comment(3)
It took me a second to catch on to this... Guava's Charsets constants are (no surprise) Charsets, not Strings. InputStreamReader has another constructor that takes a Charset rather than a string. If you really need the string, it's e.g. Charsets.UTF_8.name().Encaenia
Character sets do may vary from platform to platform, but UTF-8 is guaranteed to exist.Dartmoor
All charsets defined in StandardCharsets are guaranteed to exist in every Java implementation on every platform.Lablab
U
8

You can use Charset.defaultCharset() API or file.encoding property.

But if you want your own constant, you'll need to define it yourself.

Underlayer answered 14/7, 2011 at 18:49 Comment(1)
The default charset is usually determinded by the OS and locale settings, I don't think there is any guarantee that it remains the same for multiple java invocations. So this is no replacement for a constant sepcifying "utf-8".Gunilla
D
8

In Java 1.7+

Do not use "UTF-8" string, instead use Charset type parameter:

import java.nio.charset.StandardCharsets

...

new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);
Direct answered 23/7, 2018 at 15:46 Comment(0)
N
5

If you are using OkHttp for Java/Android you can use the following constant:

import com.squareup.okhttp.internal.Util;

Util.UTF_8; // Charset
Util.UTF_8.name(); // String
Nephritis answered 7/10, 2015 at 8:58 Comment(1)
it's removed from OkHttp, so next way is: Charset.forName("UTF-8").name() when you need support for lower Android than API 19+ otherwise you can use: StandardCharsets.UTF_8.name()Cyclosis
B
4

Constant definitions for the standard. These charsets are guaranteed to be available on every implementation of the Java platform. since 1.7

 package java.nio.charset;
 Charset utf8 = StandardCharsets.UTF_8;
Bristling answered 26/3, 2018 at 10:34 Comment(0)
P
3

Class org.apache.commons.lang3.CharEncoding.UTF_8 is deprecated after Java 7 introduced java.nio.charset.StandardCharsets

  • @see JRE character encoding names
  • @since 2.1
  • @deprecated Java 7 introduced {@link java.nio.charset.StandardCharsets}, which defines these constants as
  • {@link Charset} objects. Use {@link Charset#name()} to get the string values provided in this class.
  • This class will be removed in a future release.
Phosphor answered 11/5, 2020 at 3:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.