write UTF-8 BOM with supercsv
Asked Answered
A

2

5

I am using supercscv to write an utf-8 encoded csv. It produces a normal file but excel doesn't recognize it as utf-8 cause it's dumb, excel lost without the bom marker so any special characters are corrupted when opened with excel.

Is there a way to write a file as UTF-8 with BOM with supercsv ? I can't find it.

Thanks

Accuracy answered 18/8, 2015 at 12:8 Comment(0)
C
10

As supercsv probably wraps a Writer:

Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8);
writer.write('\uFEFF'); // BOM for UTF-*
... new BeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);
Cod answered 18/8, 2015 at 12:18 Comment(2)
Thanks @JoopEggen, that was what I was lookin for. This is how it looks like : OutputStreamWriter o = new OutputStreamWriter(out); // BOM o.write('\uFEFF'); writer = new CsvBeanWriter(o, CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);Accuracy
@Accuracy better add the UTF-8 to the new OutputStreamWriter call, as otherwise the default platform encoding is used - which is non-portable.Cod
C
1

In my experience MS Excel always opens csv files in the default MS Office charset. In my case, it was always Windows 1252 (Spain), even in not Windows Machines (MS Office for OSX). The only way to deal with it was to write CSV files with this charset.

byte[] csvFileBytes = dataObject.toCSVString().getBytes(Charset.forName("Windows-1252"));

MS Excel seems to never use another charset to open CSV files. You can check this post: Is it possible to force Excel recognize UTF-8 CSV files automatically?

Cedeno answered 18/8, 2015 at 12:26 Comment(7)
That is untrue; if the BOM is present in the file then Excel will open the file with the correct encoding. What it doesn't do UTF-8 by default is a mystery though.Betthezul
Are sure of this? MS Excel interprets file BOM?Cedeno
Yes I'm sure; try the answer above, ie write the BOM before writing anything else in the fileBetthezul
Well i've tried to use CVS files with the BOM on MS Office 2011 for Mac and for Windows (spanish versions) and i couldn't get it work properly. Thats why i had to encode it in Windows 1252.Cedeno
Please fge, can you help me to know why this code is not working for writing BOM in a csv file? String fileName = "a.csv"; File file = FileUtils.getFile(fileName); FileWriter fw = new FileWriter(file); char[] cbuf = { 0xef, 0xbb, 0xbf };// BOM fw.write(cbuf); fw.write("aaáa;eé;cccÇÇÇ;\niií;oóó;uuúúü"); fw.flush(); fw.close();Cedeno
Don't do it this way, do it as per the first answer; simply write char \ufeff. The writer will encode it for you (that's what a Writer is for; similarly a Reader will decode the bytes as chars)Betthezul
Let us continue this discussion in chat.Cedeno

© 2022 - 2024 — McMap. All rights reserved.