Java Spring returning CSV file encoded in UTF-8 with BOM
Asked Answered
P

2

6

Apparently for excel to open CSV files nicely, it should have the Byte Order Mark at the start. The download of CSV is implemented by writing into HttpServletResponse's output stream in the controller, as the data is generated during request. I get an exception when I try to write the BOM bytes - java.io.CharConversionException: Not an ISO 8859-1 character: [] (even though the encoding I specified is UTF-8).


The controller's method in question

@RequestMapping("/monthly/list")
public List<MonthlyDetailsItem> queryDetailsItems(
        MonthlyDetailsItemQuery query,
        @RequestParam(value = "format", required = false) String format,
        @RequestParam(value = "attachment", required = false, defaultValue="false") Boolean attachment,
        HttpServletResponse response) throws Exception 
{   
    // load item list
    List<MonthlyDetailsItem> list = detailsSvc.queryMonthlyDetailsForList(query);
    // adjust format
    format = format != null ? format.toLowerCase() : "json";
    if (!Arrays.asList("json", "csv").contains(format)) format = "json";

    // modify common response headers
    response.setCharacterEncoding("UTF-8");
    if (attachment)
        response.setHeader("Content-Disposition", "attachment;filename=duomenys." + format);

    // build csv
    if ("csv".equals(format)) {
        response.setContentType("text/csv; charset=UTF-8");
        response.getOutputStream().print("\ufeff");
        response.getOutputStream().write(buildMonthlyDetailsItemCsv(list).getBytes("UTF-8"));
        return null;
    }

    return list;
}
Pronunciamento answered 19/9, 2017 at 9:32 Comment(1)
Could you try response.getOutputStream().write("\ufeff".getBytes("UTF-8")); for the BOM part ?Rohde
D
7

I have just come across, this same problem. The solution which works for me is to get the output stream from the response Object and write to it as follows

    // first create an array for the Byte Order Mark
    final byte[] bom = new byte[] { (byte) 239, (byte) 187, (byte) 191 }; 
    try (OutputStream os = response.getOutputStream()) {
        os.write(bom);

        final PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
        w.print(data);
        w.flush();
        w.close();
    } catch (IOException e) {
        // logit
    }

So UTF-8 is specified on the OutputStreamWriter.


As an addendum to this, I should add, the same application needs to allow users to upload files, these may or may not have BOM's. This may be dealt with by using the class org.apache.commons.io.input.BOMInputStream, then using that to construct a org.apache.commons.csv.CSVParser. The BOMInputStream includes a method hasBOM() to detect if the file has a BOM or not. One gotcha that I first fell into was that the hasBOM() method reads (obviously!) from the underlying stream, so the way to deal with this is to first mark the stream, then after the test if it doesn't have a BOM, reset the stream. The code I use for this looks like the following:

try (InputStream is = uploadFile.getInputStream();
        BufferedInputStream buffIs = new BufferedInputStream(is);
        BOMInputStream bomIn = new BOMInputStream(buffIs);) {
      buffIs.mark(LOOKAHEAD_LENGTH);
      // this should allow us to deal with csv's with or without BOMs
      final boolean hasBOM = bomIn.hasBOM();
      final BufferedReader buffReadr = new BufferedReader(
          new InputStreamReader(hasBOM ? bomIn : buffIs, StandardCharsets.UTF_8));

      // if this stream does not have a BOM, then we must reset the stream as the test
      // for a BOM will have consumed some bytes
      if (!hasBOM) {
        buffIs.reset();
      }

      // collect the validated entity details
      final CSVParser parser = CSVParser.parse(buffReadr,
          CSVFormat.DEFAULT.withFirstRecordAsHeader());
      // Do stuff with the parser
      ...
  // Catch and clean up

Hope this helps someone.

Douzepers answered 9/9, 2019 at 12:43 Comment(1)
This really helped me, but found that I had to write the bom twice or it would not be present in the saved CSV file. Not sure why. Maybe the first one is consumed by the browser and the second saved to the CSV file on the users computer. It was consistent across 4 different browsers.Clothesline
A
-1

It doesn't make much sense: the BOM is for UTF-16; there is no byte order with UTF-8. The encoding You've set with setCharacterEncoding is used for getWriter, not for getOutputStream.

UPDATE:

OK, try this:

if ("csv".equals(format)) {
    response.setContentType("text/csv; charset=UTF-8");
    PrintWriter out = response.getWriter();
    out.print("\uFEFF");
    out.print(buildMonthlyDetailsItemCsv(list));
    return null;
}

I'm assuming that method buildMonthlyDetailsItemCsv returns a String.

Archibaldo answered 19/9, 2017 at 9:48 Comment(4)
This is not true. See en.wikipedia.org/wiki/Byte_order_mark. There is a BOM defined for UTF-8 but it is not mandatory. Alas, Java is not able to handle UTF-8 BOMs automatically.Nabors
@Nabors Well it's useless with UTF-8 encoding.Archibaldo
Yes maybe. But this is not the point. Sometimes you have to deal with it. Useless or not.Nabors
With regards to my answer, I had to deal with files that had been created by Excel, that had BOM's, the procedure I explained worked for me. @MauricePerry The BOM value is non-UTF8, so is ignored by UTF8, which I suspect is why it has been used as a metadata (I guess) value by Excel. However programmes still have to deal with it.Douzepers

© 2022 - 2024 — McMap. All rights reserved.