decode string encoded in utf-8 format in android
Asked Answered
P

1

12

I have a string which comes via an xml , and it is text in German. The characters that are German specific are encoded via the UTF-8 format. Before display the string I need to decode it.

I have tried the following:

try {
    BufferedReader in = new BufferedReader(
            new InputStreamReader(
                    new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
    event.attributes.put("title", in.readLine());
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

I have also tried this:

try {
    event.attributes.put("title", URLDecoder.decode(nodevalue, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

None of them are working. How do I decode the German string

thank you in advance.

UDPDATE:

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    // TODO Auto-generated method stub
    super.characters(ch, start, length);
    if (nodename != null) {
        String nodevalue = String.copyValueOf(ch, 0, length);
        if (nodename.equals("startdat")) {
            if (event.attributes.get("eventid").equals("187")) {
            }
        }
        if (nodename.equals("startscreen")) {
            imageaddress = nodevalue;
        }
        else {
            if (nodename.equals("title")) {
                // try {
                // BufferedReader in = new BufferedReader(
                // new InputStreamReader(
                // new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
                // event.attributes.put("title", in.readLine());
                // } catch (UnsupportedEncodingException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // } catch (IOException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // }
                // try {
                // event.attributes.put("title",
                // URLDecoder.decode(nodevalue, "UTF-8"));
                // } catch (UnsupportedEncodingException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // }
                event.attributes.put("title", StringEscapeUtils
                        .unescapeHtml(new String(ch, start, length).trim()));
            } else
                event.attributes.put(nodename, nodevalue);
        }
    }
}
Particia answered 29/4, 2011 at 5:0 Comment(1)
I could not find this Q&A when i needed it. Hence i have retaged it now, i hope this will pop-up quick next timeVerdi
C
21

You could use the String constructor with the charset parameter:

try
{
    final String s = new String(nodevalue.getBytes(), "UTF-8");
}
catch (UnsupportedEncodingException e)
{
    Log.e("utf8", "conversion", e);
}

Also, since you get the data from an xml document, and I assume it is encoded UTF-8, probably the problem is in parsing it.

You should use InputStream/InputSource instead of a XMLReader implementation, because it comes with the encoding. So if you're getting this data from a http response, you could either use both InputStream and InputSource

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    Reader reader = new InputStreamReader(in, "UTF-8");
    InputSource is = new InputSource(reader);
    is.setEncoding("UTF-8");
    parser.parse(is, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

or just the InputStream:

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 1

Here is a sample of a complete request and response handling:

try
{
    final DefaultHttpClient client = new DefaultHttpClient();
    final HttpPost httppost = new HttpPost("http://example.location.com/myxml");
    final HttpResponse response = client.execute(httppost);
    final HttpEntity entity = response.getEntity();

    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 2

As the problem is not the encoding but the source xml being escaped to html entities, the best solution is (besides correcting the php to do not escape the response), to use the apache.commons.lang library's very handy static StringEscapeUtils class.

After importing the library, in your xml handler's characters method you put the following:

@Override
public void characters(final char[] ch, final int start, final int length) 
    throws SAXException
{
    // This variable will hold the correct unescaped value
    final String elementValue = StringEscapeUtils.
        unescapeHtml(new String(ch, start, length).trim());
    [...]
}

Update 3

In your last code the problem is with the initialization of the nodevalue variable. It should be:

String nodevalue = StringEscapeUtils.unescapeHtml(
    new String(ch, start, length).trim());
Crater answered 29/4, 2011 at 5:29 Comment(22)
where do i pass the url over here?Particia
you mean the url from where you get the xml data response?Crater
yes. also what is the response object you have used. is it HttpResponse object?Particia
please check my update for the full request method. I used there HttpPost, so if you need to set an entity with namevaluepair parameters, you can encode them in "UTF-8" too.Crater
i put your update further more i am doing the conversion with final String s = new String(nodevalue.getBytes(), "UTF-8"); also. but the characters are still not getting decoded.Particia
could you please share the url from where you get your xml, to give it a try?Crater
Thank you. The problem is, that the xml that comes from your php is escaped to entities, not encoded in utf-8. So instead of serving %FC for the character ü, it serves üCrater
can i go back to using xml reader for the lastest update to work or do i still need to you inputstream.Particia
since the problem wasn't the decoding, you might get it work with only using the apache commons' StringEscapeUtils as it is in my sample above. You should give it a try with the XMLReader + StringEscapeUtils.Crater
how do i import the external xml. i have done the follawing rightclick on project->buildpath->add lib->select the zip file i have downloaded. now the zip file is within referenced libraries folder in my project. how do i import the stringescapeutils class from this lib to my project?Particia
you must unzip the zip, and add only the commons-lang-2.6.jar library file from inside the zip to your application. (btw, i couldn't open the zip, so i had to download the tar.gz file)Crater
there is no jar file once i open up the zip. i am now downloading the tar.gz file. what steps do i follow with this file?Particia
you should extract it, find the commons-lang-2.6.jar file right in its root, and copy it into your project's /lib directory (if it doesn't exist, create one). Once the jar is there, right click on the jar file from Eclipse, Build path > Add to build path. That's it.Crater
sorry it was my mistake i donwloaded the wrong file. i download the .zip file within the section under source. now i have downloaded the zip file under the binaries section... there are 3 jar files in there.Particia
done, i did what you had said. imported. used the stringescapeutils classs but still no results. did you try it yourself. were the characters being decoded?Particia
yes, i did try it, and got Düsseldorf, etc., so i guess it must work. Where are you using the StringEscapeUtils?Crater
when i get the title tag . inside the characters method. i have imported all the three jar file that came in the zip file.commons-lang-2.6.jar, commons-lang-2.6-sources.jar, commons-lang-2.6-javadoc.jar. let me put the code that i am using.Particia
ok. but you only need the core library, no javadoc, neither source should be imported. they can be referenced though for the view source, respectively view javadoc of the classes. so only the commons-lang-2.6.jar must be included.Crater
please check out the code that i have put above. i am putting the unescaped string into a hashmap which i then put into a sql lite database.Particia
by the way how did you come to know of the external library... i mean knowing such things requires a whole lot of knowledge....? if you are doing something then i think then even i need to put it into practice... i won't have to struggle that much :-)Particia
see my update for the problem. you can escape the nodevalue variable right when initializing it.Crater
about the "knowing the external library": long years of googling will have its crop ;) i've used a lot of libraries from apache in other (mainly web) projects, and fortunately this one can be integrated into the android platform apps too (no native stuffs).Crater

© 2022 - 2024 — McMap. All rights reserved.