request.getCharacterEncoding() returns NULL... why?
Asked Answered
H

2

9

A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.

I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.

Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.

Things I've ruled out:

  • Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  • Both form tags in the HTML use POST, and do not set encodings
  • Checking from Firebug, both the Request and Response headers have the same properties
  • Both JSP pages use the same attributes in the <%@page contentType="text/html;charset=UTF-8" language="java" %> tag
  • There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
  • I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8

If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue. eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")

Any suggestions are welcome, I'm all out of ideas.

Hayleyhayloft answered 10/9, 2012 at 19:20 Comment(0)
E
15

Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.

request.setCharacterEncoding("UTF-8");

Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.

See also:

Endive answered 10/9, 2012 at 19:25 Comment(1)
Thank you! This pointed me in the right direction. The encoding gets set to UTF-8 in the BaseValidatorForm class's reset function, but it was being overridden without calling super.reset() in the form class my coworker wrote.Hayleyhayloft
F
0

The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset

So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%@page tag doesn't affect the POST data.

Fructidor answered 10/9, 2012 at 19:26 Comment(1)
It doesn't (at least for Firefox).Vano

© 2022 - 2024 — McMap. All rights reserved.