string decode utf-8
Asked Answered
T

3

25

How can I decode an utf-8 string with android? I tried with this commands but output is the same of input:

URLDecoder.decode("hello&//à", "UTF-8");

new String("hello&//à", "UTF-8");

EntityUtils.toString("hello&//à", "utf-8");
Tympanic answered 9/5, 2011 at 22:17 Comment(2)
That String is not in a particular encoding at all. What is it, the problem which you're trying to solve? What exactly do you mean with "decode"? What encoding was it in, did you think?Miculek
try using a local variable to hold the result. Ex: String str = URLDecoder.decode("hello&//à", "UTF-8");Xylotomy
C
51

A string needs no encoding. It is simply a sequence of Unicode characters.

You need to encode when you want to turn a String into a sequence of bytes. The charset the you choose (UTF-8, cp1255, etc.) determines the Character->Byte mapping. Note that a character is not necessarily translated into a single byte. In most charsets, most Unicode characters are translated to at least two bytes.

Encoding of a String is carried out by:

String s1 = "some text";
byte[] bytes = s1.getBytes("UTF-8"); // Charset to encode into

You need to decode when you have а sequence of bytes and you want to turn them into a String. When yоu dо that you need to specify, again, the charset with which the bytеs were originally encoded (otherwise you'll end up with garblеd tеxt).

Decoding:

String s2 = new String(bytes, "UTF-8"); // Charset with which bytes were encoded 

If you want to understand this better, a great text is "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

Cater answered 9/5, 2011 at 22:52 Comment(0)
S
10

the core functions are getBytes(String charset) and new String(byte[] data). you can use these functions to do UTF-8 decoding.

UTF-8 decoding actually is a string to string conversion, the intermediate buffer is a byte array. since the target is an UTF-8 string, so the only parameter for new String() is the byte array, which calling is equal to new String(bytes, "UTF-8")

Then the key is the parameter for input encoded string to get internal byte array, which you should know beforehand. If you don't, guess the most possible one, "ISO-8859-1" is a good guess for English user.

The decoding sentence should be

String decoded = new String(encoded.getBytes("ISO-8859-1"));
Selfregulated answered 12/2, 2015 at 22:7 Comment(1)
Not clear, how incomplete multibyte UTF8 chunks are handled.Radiothermy
B
0

Try looking at decode string encoded in utf-8 format in android but it doesn't look like your string is encoded with anything particular. What do you think the output should be?

Barnwell answered 9/5, 2011 at 22:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.