UTF-8 encoding issue when exporting csv file in JavaScript
Asked Answered
I

4

33

I use the below function to export an array to a csv files in JavaScript, but the Chinese characters become messy code with Microsoft Excel 2013 in Windows7.

I open the exported file with a notepad but it displays finely.

function arrayToCSVConvertor(arrData, reportTitle) {
    var CSV='';
    arrData.forEach(function(infoArray, index){
        var dataString = infoArray.join(",");
        dataString= dataString.split('\n').join(';');
        CSV += dataString+ "\n";
    });

    if (CSV == '') {
        alert("Invalid data");
        return;
    }

    //create a link and click, remove
    var link = document.createElement("a");
    link.id="lnkDwnldLnk";

    //this part will append the anchor tag and remove it after automatic click
    document.body.appendChild(link);

    var csv = CSV;

    var blob = new Blob([csv], { type: ' type: "text/csv;charset=UTF-8"' });//Here, I also tried charset=GBK , and it does not work either
    var csvUrl = createObjectURL(blob);

    var filename = reportTitle+'.csv';

    if(navigator.msSaveBlob){//IE 10
        return navigator.msSaveBlob(blob, filename);
    }else{
        $("#lnkDwnldLnk")
            .attr({
                'download': filename,
                'href': csvUrl
            });
        $('#lnkDwnldLnk')[0].click();
        document.body.removeChild(link);
    }
}
Imprecate answered 12/8, 2015 at 8:6 Comment(0)
I
78

Problem solved by adding BOM at the start of the csv string:

var csv = "\ufeff"+CSV;
Imprecate answered 14/8, 2015 at 4:34 Comment(1)
Awesome. It worked. But can you explain the solution a bit?Stubblefield
M
27

This is my solution:

var blob = new Blob(["\uFEFF"+csv], {
    type: 'text/csv; charset=utf-8'
});
Macmillan answered 20/11, 2018 at 21:9 Comment(0)
V
3
var csv = "\ufeff"+CSV;

Explanation about this code:

The BOM character (represented by "\ufeff" in JavaScript) is a special Unicode character that indicates the byte order and the encoding scheme of the text.

Some software applications require the BOM character to be present in UTF-8 encoded files to recognize the file as a UTF-8 encoded text file. For example, Microsoft Excel may not recognize a UTF-8 encoded CSV file without a BOM character, and may display the characters incorrectly.

Therefore, adding the BOM character to the CSV data string ensures that the resulting file is recognized as a UTF-8 encoded text file by most software applications, including Excel.

Varien answered 24/3, 2023 at 19:49 Comment(0)
A
0

According to RFC2781, the byte order mark (BOM) 0xFEFF is the BOM for UTF-16 little endian encoding (UTF16-LE). While adding the BOM may resolve the issue for Windows, the problem still exists if one is about to open the generated CSV file using Excel on MacOS.

A solution for writing a multibyte CSV file that works across different OS platforms (Windows, Linux, MacOS) applies these three rules:

  1. Separate the field with a tab character instead of comma
  2. Encode the content with UTF16-LE
  3. Prefix the content with UTF16-LE BOM, which is 0xFEFF
Aneto answered 30/3, 2020 at 3:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.