HttpClient ignoring encoding, on a single computer
Asked Answered
C

3

5

I am using HttpClient (version 3.1) on several different (but apparently identical) computers to read a UTF-8 encoded JSON data from a URL.

On all the machines, save one, it works fine. I have some Spanish language words and they come through with accents and tildes intact.

One computer stubbornly refuses to cooperate. It is apparently treating the data as ISO-8859-1, despite a Content-Type: application/json;charset=utf-8 header.

If I use curl to access that URL from that computer, it works correctly. On every other computer, both curl and my HttpClient-based program work correctly.

I did an md5sum on the common-httpclient.jar file on each machine: the same.

Is there some setting, deep in Linux, that might be different and be messing with me? Any other theories, or even places to look?

EDIT: some people asked for more details.

Originally I had the problem deep in the bowels of a complex Tomcat app, but I lightly adapted the sample to just retrieve the URL in question, and (fortunately) had the same problem.

These are Linux 2.6 machines running jdk1.7.0_45.

An env command yields a bunch of variables. The only one that looks remotely on point is LANG=en_US.UTF-8.

Criticaster answered 15/5, 2014 at 6:36 Comment(8)
may you explain a little more about the machine on which it isn't working, that's a linux? which one?Instruction
Can you clarify the setup? Is problem with command line client that uses httpclient to access some URL? What locale system environmental variables are set on this computer?Meritocracy
@Instruction answer in edit.Criticaster
How are you then viewing the results? Can you post a short but complete program which demonstrates the problem? (You talk about the sample in the HttpClient docs - are you saying you just need to change the URL in there? Note that the sample uses the platform-default encoding, which is a bad idea.) Can you save the binary data to a file, and compare that with what curl downloads? That would isolate the problem.Mhd
Do you use SpringMVC on your serverside Controller?Salesclerk
why is it Content-Type: application/json and not Content-Type= application/json?Dimond
Almost certainly the system character set that the JVM ends up using is different on that one machine. The following ServerFault question may be relevant: serverfault.com/questions/149071/…Kelwunn
Did you try setting charset=UTF-8. Linux might be case sensitive, although the UTF-8 is the default for JSON data. Also, did you try with different browsers (Chrome, Firefox, Opera,...) on same machine? Same result?Pichardo
O
5

How do you get the json response data from HttpClient?

If you get it back in binary form (through getResponseBodyAsStream() for example), and then convert it to a String without specifying charset, then the result depends on your JVM's default charset.

You can check the value of JVM default charset by:

Charset.defaultCharset().name()

This might give "UTF-8" on all machines except the one failing.

Orinasal answered 22/5, 2014 at 13:25 Comment(0)
G
2

Without seeing your code, it is difficult to say what's wrong, but here is a "correct" way of doing this (using HttpClient 3.1.0 for request and Jackson 2.1.3 to parse the JSON).

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.http.HttpStatus;

import java.io.IOException;
import java.io.InputStreamReader;

HttpClient hc = new HttpClient();
GetMethod get = new GetMethod(uri);
int status = hc.executeMethod(get);
if (status != HttpStatus.SC_OK) throw new RuntimeException("http status " + status);
ObjectMapper jsonParser = new ObjectMapper(new JsonFactory());
// we use an InputStreamReader with explicit charset to read the response body
JsonNode json = jsonParser.readTree(
    new InputStreamReader(get.getResponseBodyAsStream(), get.getResponseCharSet())
);
Gorizia answered 24/5, 2014 at 13:56 Comment(0)
S
2

I already faced this issue and this was because of the encoding type configured in the client. So I had to make a "work around" like the one below:

String encmsg = new String(respStr.getBytes("ISO-8859-1"), java.nio.charset.Charset.forName("UTF-8"));

It reads the String as ISO-8859-1 and convert to UTF-8.

Stalinism answered 26/5, 2014 at 3:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.