UTF-8 Strings getting scrambled by Restlet on GAE
Asked Answered
P

2

6

I have a simple Restlet service hosted on AppEngine. This performs basic CRUD operations with strings and is working well with all sorts of UTF-8 characters when I test it with curl (for all the verbs).

This is consumed by a simple restlet client hosted in a servlet on another AppEngine app:

// set response type
resp.setContentType("application/json");
// Create the client resource
ClientResource resource = new ClientResource(Messages.SERVICE_URL + "myentity/id");
// Customize the referrer property
resource.setReferrerRef("myapp");
// Write the response
resource.get().write(resp.getWriter());

The above is pretty much all I have in the servlet. Very plain.

The servlet is invoked via jquery ajax, and the json that I get back is well formed and everything, but the problem is that UTF-8 encoded strings are coming back scrambled, for example: Université de Montréal becomes Universit?? de Montr??al.

I tried adding this line in the servlet (before everything else):

resp.setCharacterEncoding("UTF-8");

But the only diference is that instead of getting ?? I get Universitᅢᄅ de Montrᅢᄅal (I don't even know what kind of characters those are, asian I suppose).

I am 100% sure the restlet service is OK, because other than debugging it line by line I am able to test it from cmd line with curl and it's returning well formed strings.

By looking at the http header of the response from firefox (when calling the servlet via javascript) I can see the encoding is indeed UTF-8, as expected. After hours of struggling reading every possible related article I came across this restlet discussion and noticed that indeed I do have Transfer-Encoding: chunked on the http header of the response. I tried the proposed solutions (override ClientResource.toRepresentation, didn't do any good so I tried restlet 2.1 as susggested with ClientResource.setRe​questEntityBuffering​(true), no luck there either) but I am not convinced my issue is related to Transfer-Encoding: chunked at all.

At this point I am out of ideas, and I would really appreciate any suggestions! O_o

UPDATE:

I tried doing a manual GET with a classic UrlConnection and the string is coming back alright:

URL url = new URL(Messages.SERVICE_URL + "myentity/id");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();

StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");

resp.getWriter().print(writer.toString()); 

So much for being all RESTful and fancy ...but still I have no clue why the original version doesn't work! :/

Poetize answered 31/12, 2011 at 6:6 Comment(3)
Chunked transfer encoding should be unrelated to charset problems... If you write the raw string in question to resp.getWriter(), bypassing restlet entirely, is it transferred properly?Fetal
You mean by doing a 'manual' GET to my service from the servlet?Poetize
See update, it works OK if I bypass restlet. I guess it's a restlet bug or smt O_oPoetize
P
1

I tried doing a manual GET with a classic UrlConnection and the string is coming back alright:

URL url = new URL(Messages.SERVICE_URL + "myentity/id");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();

StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");

resp.getWriter().print(writer.toString());

So much for being all RESTful and fancy ...but still I have no clue why the original version doesn't work! :/

Poetize answered 7/8, 2012 at 15:22 Comment(0)
P
0

Does your response contain the appropriate "Content-Type" header? It should be something like "Content-Type: application/json; charset=UTF-8" (note the charset).

Try starting your development server and retrieving your resource from the command line using cURL and inspecting the headers, e.g. curl -i http://localhost:8080/myentity/id. In theory browsers should assume UTF-8 for JSON, but I wouldn't trust on that.

Pompom answered 24/3, 2012 at 12:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.