servlet file upload filename encoding
Asked Answered
P

4

19

I am using the Apache Commons Fileupload tools for standard file upload. My problem is that I cannot get the proper filename of uploaded files if they contain special characters (á, é, ú, etc.) They all get converted to ? signs.

request.getCharacterEncoding() says UTF-8, but the bytes I get in the string fileItem.getName() are all the same for all my special characters.

Can you help me what's wrong?

(Some details: using Firefox 3.6.12, Weblogic 10.3 on Windows)

This is my code snippet:

 public CommandMsg(HttpServletRequest request) {
    Enumeration names = null;
    if (isMultipart(request)) {
      FileItemFactory factory = new DiskFileItemFactory();
      ServletFileUpload upload = new ServletFileUpload(factory);
      try {
        List uploadedItems = upload.parseRequest(request);
        Iterator i = uploadedItems.iterator();
        FileItem fileItem = null;
        while (i.hasNext()) {
          fileItem = (FileItem) i.next();
          if (fileItem.isFormField()) {
            // System.out.println("isFormField");
            setAttribute(fileItem.getFieldName(), fileItem.getString());
          } else {
            String enc = "utf-8";
            enc = request.getCharacterEncoding();
            String fileName = fileItem.getName();
            byte[] fnb = fileItem.getName().getBytes();
            byte[] fnb2 = null;
            try {
                fnb2 = fileItem.getName().getBytes(enc);
                String t1 = new String(fnb);
                String t2 = new String(fnb2);
                String t3 = new String(fnb, enc);
                String t4 = new String(fnb2, enc);
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            }
            setAttribute(fileItem.getFieldName(), fileItem);
          }
        }
      } catch (FileUploadException ex) {
        ex.printStackTrace();
      }

// etc..
Principal answered 16/2, 2011 at 19:40 Comment(2)
Have you tried to get file name with and without decoding the file name? It is a bit odd that there is such a problem in a widely used library.Dealing
yes, as you can see the first attempt is: "String fileName = fileItem.getName();" which is bad. All the other lines (t1..4) are only desperate attempts.. :-)Principal
C
18

I had the same problem and solved it like this.

ServletFileUpload upload = new ServletFileUpload(factory);
upload.setHeaderEncoding("UTF-8"); 

FileItemIterator iter = upload.getItemIterator(request);
while (iter.hasNext()) {
    FileItemStream item = iter.next();
    String name = item.getFieldName();
    InputStream stream = item.openStream();
    if (item.isFormField()) {
        String value = Streams.asString(stream, "UTF-8");
    } 
}

If you based your code on the example provided in http://commons.apache.org/fileupload/streaming.html then you need to make sure you set UTF-8 in two places above.

Capitalist answered 7/5, 2012 at 20:8 Comment(2)
Christoph You made my day. It's a pity to have to write such a boilerplate code but it works well : I spend half a day looking on the html part while the "problem" was on the server side... ;)Alcheringa
You don't need to explicitly handle the stream, you can just use FileItem#getString(String), where you specify the encoding as, e.g. "UTF-8": item.getString( "UTF-8" ).Ceramist
F
2

You need to ensure that the target console/file/database/whatever where you're printing/writing/inserting the file name to supports UTF-8 as well. The question marks indicate that it isn't configured to accept UTF-8 and that the target itself is aware of that. Otherwise you would just have seen mojibake.

Since the detail about the target is missing in the question, I can't do much more than suggesting to get yourself through this article to understand what's going on with characters behind the scenes.

Farnese answered 17/2, 2011 at 0:23 Comment(4)
you are right, I did not provide information about the display target. Well, I saw the question marks while debugging in Eclipse's Variable view (where all special characters are OK in my program), also in the log4j logfiles, in the database where these names are inserted and finally when the files were downloaded back to the client.Principal
I always start examining such problems by debugging. If I can see the correct String in my watch window than it is straightforward to track down where they get wrong. However in this case at the very first point when I get my filename it is incorrect.Principal
As I understand the browser tells the encoding of its message in the HTTP header. When the request is parsed this encoding should be used. The Apache javadoc also says for ServletFileUpload.setHeaderEncoding: "When not specified, or null, the request encoding is used." In my case the enc = request.getCharacterEncoding(); resulted in "UTF-8", so I think this is what the browser sends. But why the parser is unable to get the correct filename then.. ? :-(Principal
Start with changing Eclipse workspace encoding: Window > Preferences > General > Workspace set Text file encoding to UTF-8.Farnese
P
2

Solved the problem by calling ServletFileUpload instance's .setHeaderEncoding("ISO-8858-2") explicitly.

Principal answered 17/2, 2011 at 15:55 Comment(1)
Use ISO-8859-2 instead of ISO-8858-2 which is not supported by Java : docs.oracle.com/javase/7/docs/technotes/guides/intl/…; we hit into java.io.UnsupportedEncodingException: ISO-8858-2 when using ISO-8858-2 encoding. DiskFileUpload upload = new DiskFileUpload(); upload.setHeaderEncoding("ISO-8859-2")Liddy
S
0

For these special charecters, u can set the Encoding to "iso 8859-1". The UTF-8 seems to be not working..

If u r not setting any encoding type.. Then linux machine will take the default encoding which is UTF-8 and windows will take the compatible encoding

Sulfapyridine answered 8/5, 2012 at 7:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.