Is it possible to force Excel recognize UTF-8 CSV files automatically?
Asked Answered
B

34

632

I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic letters, Greek letters) in Excel does not achieve the expected results showing something like Г„/Г¤, Г–/Г¶. And I don't know how to force Excel understand that the open CSV file is encoded in UTF-8. I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that.

Is there any workaround?

P.S. Which tools may potentially behave like Excel does?


UPDATE

I have to say that I've confused the community with the formulation of the question. When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user, in a fluent and transparent way. However, I used a wrong formulation asking for doing it automatically. That is very confusing and it clashes with VBA macro automation. There are two answers for this questions that I appreciate the most: the very first answer by Alex, and I've accepted this answer; and the second one by Mark that have appeared a little later. From the usability point of view, Excel seemed to have lack of a good user-friendly UTF-8 CSV support, so I consider both answers are correct, and I have accepted Alex's answer first because it really stated that Excel was not able to do that transparently. That is what I confused with automatically here. Mark's answer promotes a more complicated way for more advanced users to achieve the expected result. Both answers are great, but Alex's one fits my not clearly specified question a little better.


UPDATE 2

Five months later after the last edit, I've noticed that Alex's answer has disappeared for some reason. I really hope it wasn't a technical issue and I hope there is no more discussion on which answer is greater now. So I'm accepting Mark's answer as the best one.

Busby answered 14/5, 2011 at 13:53 Comment(16)
Can you use tab-delimited files? If so, you may have more luck.Bootblack
Office 2010+: best works: UTF-16LE with BOM and \t as delimiter. Will work in English and non English Excel setting. Can press Ctrl-S without selecting the file format, etc. Will preserve Unicode chars.Inlier
@Duncan frankly speaking, I'm confused and it's hard "to judge". I've accepted alex's answer more than 2 years ago (Q and A both on May 14), and then Mark answered suggesting a tricky workaround (I actually didn't check it) that surprisingly scored scored more points at my question that surprisingly became popular as well. We actually refused Excel support in my project long time ago, and I don't want to take away alex's reputation score, because Excel seemed, and probably still seems (please correct me if I'm wrong), to mishandle CSV. It's hard to say, sorry.Busby
Alex's answer was deleted by a moderator. No explanation was given, but if you have enough reputation you can still see it.Swimmingly
Excel 2010 works fine with UTF8 BOM for me so don't understand that it didn't work for you. I use ";" as delimiter. Double clicking csv files of this type opens correctly.Lidstone
@Lidstone And you put a down-vote just because your Excel 2010 experience differs from what was mine more than 6 years ago? Nice. I claim that it did not work for Excel in a straight-forward way at that moment.Busby
Since time machines don't exist yet well just have to disagree.Lidstone
@Lidstone All the people who have answered the question had the same issue and suggested their workarounds. Still disagree?Busby
Several comments in addition to mine says it works with UTF8 BOM, even from Excel 2007. I tested Excel 2007 myself and it works. So "I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that." is not true.Lidstone
@Lidstone You must have used either data import or whatever else, but definitely not the regular open function I was asking in the original question -- so that it might work for you. Again, if your claim against the original question issue was true, I would never post such a question, or I'd be notified within the 6 years period at least once by whoever else, and probably within the very first day. Some folks say it works, some say it doesn't. I say it doesn't. Please read the question more carefully and take into account other factors you're probably missing. Thanks.Busby
Yes, the regular open by (double click) work with UTF8 BOM. Tested with both Excel 2007 and 2010. Please leave me alone.Lidstone
I visited this page - constitution.kremlin.ru - which is a UTF-8 page as evidented by the tag "<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />". I saved the page as a .txt file and then opened it with Excel 2010. The Text Import Wizard was automatically invoked and it showed the file origin to be "65001:Unicode (UTF-8). I simply clicked Finish and Excel displayed the page in Russian which I can't read. When you people open a UTF-8 file, isn't Text Import Wizard automatically invoked?Bernice
@yky I don't really remember, and don't even remember the exact version I tested the issue with. Maybe you have some Excel updates since then?Busby
I use excel 2011 mac version. It can not handle utf-8 with BOM.Troposphere
Correct answer should probably be "Do not use Microsoft". There are better alternatives, e.g. free and open-source LibreOffice suite, specifically, LibreOffice Calc. It's 2020 but Excel mobile app still cannot display UTF-8 properly. This is ridiculous.Marquardt
Isn't it crazy that 12 years after this thread, I ran into EXACTLY the same problem today with Excel? Translated data into Spanish and then extracted it as a CSV, and tried to open it in Excel. Accented characters look like garbage. Do Microsoft actually listen to anyone who uses their products?Mezzorilievo
T
499

Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files:

  1. Save the exported file as a csv
  2. Open Excel
  3. Import the data using Data-->Import External Data --> Import Data
  4. Select the file type of "csv" and browse to your file
  5. In the import wizard change the File_Origin to "65001 UTF" (or choose correct language character identifier)
  6. Change the Delimiter to comma
  7. Select where to import to and Finish

This way the special characters should show correctly.

Teage answered 27/6, 2011 at 2:19 Comment(1)
Nothing "Automatic" about this answerSupramolecular
D
224

The UTF-8 Byte-order mark (BOM) will clue Excel 2007+ in to the fact that you're using UTF-8. (See this SO post).

In case anybody is having the same issues I was, .NET's UTF8 encoding class does not output a byte-order marker in a GetBytes() call. You need to use streams (or use a workaround) to get the BOM to output.

Dye answered 9/7, 2012 at 16:40 Comment(19)
Thanks for the post link (I didn't know about the UTF-16 case).Busby
Inserting a UTF-8 BOM seems to be the way to go. After that it's transparent.Mirellamirelle
Thank you very much. This is way better than trying to guess what encoding excel fancies.Cinda
I use Notepad++ to easily convert the .csv from UTF-8 to UTF-8 with BOMExuberant
Yes - best solution. Lots of confusing posts around setting headers in http responses. This solves the problem. Same can be seen when opening the file in notepad and saving right back down using the UTF-8 option. Also adds the byte-order marker.Hollie
Solution from the post referenced: private byte[] AddEncodingPreamble(byte[] byteArray, Encoding encoding) { return encoding.GetPreamble().Concat(byteArray).ToArray(); }Hollie
You did not read the question! Lyubomyr Shaydariv wrote: " I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."Fewer
@Elmue: I figured maybe (like me) he thought he was saving it with a byte-order marker, but wasn't because some library didn't work how he thought it did. I found it worthwhile to add this answer, and evidently a number of other people have found it useful too. At any rate, it seems like a better answer than saying not to use CSV, or instructing users how to open the file in some bizarre way.Dye
No. The cause that it did not work for him is not that he was unable to save a simple 3 bytes to a file. The cause that it does not work is that at least Excel 2007 IGNORES the BOM completely.Fewer
@Elmue: I'll grant that it's an imperfect solution. If there were any way to automatically make this work for older versions of Excel, I'd prefer them. But recognizing that there is no solution to the problem, and at least 69 people have found this answer to be helpful so far, I'm going to leave it here. Thank you for your observations.Dye
I use plain old Notepad to easily convert .csv from UTF-8 to UTF-8 with BOM. Just open and save.Fluid
The Mac version of Excel seems to still ignore the BOM. (Excel for Mac 2011.)Gravedigger
.NET does output a BOM if you employ the streams correctly, see here: #5266569Vinegary
@marsze: Thanks for pointing that out. There is another answer on the link that I posted which uses streams. I guess the lesson is that GetBytes() was never intended to be used to produce an entire file's contents by itself: we're supposed to use streams and let them call into GetPreamble() and GetBytes() and such.Dye
@Dye Very true, a BOM belongs only at the start of streams. What I like about that other answer (except that it's basically just another way of writing it: string.ToBytes(encoding) instead of encoding.ToBytes(string)) is that you can pass new UTF8Encoding(true) or new UTF8Encoding(false) for controlling the output of a BOM or not.Vinegary
Confirmed Excel for Mac 2011 reads the BOM as a wrong char in the first column and ignores the file is utf-8. So this solution won't work for that version.Embryotomy
Just tested this with Excel for Mac (v 16.16) and it does recognize it, and reads the rest of the UTF-8 characters correctly. Without it you still get garbage.Missile
Excel Office 365 correctly reads CSV files with a UTF-8 BOM. One side note is that if you have a TSV (tab-separated values) file, it will recognize the BOM, but will not recognize the tabs as separators. Removing the BOM makes it recognize the tabs correctly, but then multi-byte characters are mangled. As a workaround, import the TSV in LibreOffice Calc and save it as a UTF-8 encoded CSV.Lepidopteran
My testing with Excel 2007 (yes, quite an old version, I know) showed that a CSV with BOM did NOT open correctly when opening the CSV into Excel directly from Windows Explorer. However opening it within Excel (File > Open) launched the "Import Wizard" which did have UTF-8 set correctly by default, but required me to manually specify that it was comma separated. Will have to use that long-winded method each time I open the file.Francinefrancis
B
93

The bug with ignored BOM seems to be fixed for Excel 2013. I had same problem with Cyrillic letters, but adding BOM character \uFEFF did help.

Boss answered 26/4, 2013 at 7:52 Comment(11)
Since I added \uFEFF at the beginning of my CSV file (generated in Java), Excel is able to open them correctly! ThxExhalation
that's solved my issue too , in php it looks like this: $utf8_with_bom = chr(239) . chr(187) . chr(191) . $csvText;Turkoman
UTF-8, by definition, does not use nor should use the BOM characters. The way Excel reads BOM screws up the advantage of UTF-8 over Unicode, which is backwards compatibility with ASCII. Adding the BOM will make Excel work, but break other proper UTF-8/ASCII file reads.Compositor
@Compositor the BOM shouldn't be necessary, but it is. Microsoft values backwards compatibility and Windows programs have always assumed a file would be encoded with a MBCS character set. Changing that assumption would break too many legacy systems. It would be nice if you could specify UTF-8 as your MBCS encoding, but they've never gotten that to work consistently.Swimmingly
@MarkRansom but that's not UTF-8 specification at all. By definition, UTF-8 is suppose to be compatible with standard ASCII at the binary level, which means no characters above 127. Everything else under is suppose to be straight up "standard" with no special characters. What Microsoft should be doing is DEFAULT UTF-8 if there are no BOM, not default to their MS craptastic encoding.Compositor
@Compositor no, what Microsoft "should" be doing is making sure that things that used to work, continue to work. And that's exactly what they're doing. I know the spec says it isn't necessary, and in a perfect world it wouldn't be. But that's not the world we live in.Swimmingly
@MarkRansom Microsoft does an atrocious job of keeping things working. Just look at this thorough report on the Excel inconsistencies in dealing with BOM: wiki.scn.sap.com/wiki/display/ABAP/…Compositor
@Compositor Wikipedia seems to think that UTF-8 does have a BOM The UTF-8 representation of the BOM is the byte sequence 0xEF,0xBB,0xBF. Anyway, CSV is an imprecise format. Moreover, the link you provided shows limited knowledge of non-US cultures - ; is the list separators for several European languages because , is the decimal separator and can't be used as a list separator. If one wants fidelity, one should use the XLSX format - it's just a compressed XML with a very well defined schema.Jestude
\FE\FF makes Mac OS Excel 2016 the whole file as Japanese hieroglyphs.Endurance
Works for me too. The BOM character needs to be at the beginning of the file.Fulviah
Given the strong views on either side, I'd say that there should be an option in the Excel settings for how the user prefers to treat CSVs that are opened directly. I would like to be able to double-click a CSV and have Excel default to assuming it's Unicode.Riot
F
81

It is incredible that there are so many answers but none answers the question:

"When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user,..."

The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel. Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.

So the question is how to make Excel show UTF8 data with a simple double click?

Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:

"I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."

I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:

Content of UTF8 CSV file:

Colum1;Column2
Val1;Val2
Авиабилет;Tλληνικ

Result in Excel 2007:

CSV UTF8 Excel

A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional problem to the UTF-8 problem.

Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.


However, if you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.

Content of UTF8 XLS file:

<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

UTF8 HTML Excel

You can even use colors in HTML which Excel will show correctly.

<style>
.Head { background-color:gray; color:white; }
.Red  { color:red; }
</style>
<table border=1>
<tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

UTF8 HTML Excel

In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:

<html xmlns:x="urn:schemas-microsoft-com:office:excel">
    <head>
        <meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
        <xml>
            <x:ExcelWorkbook>
                <x:ExcelWorksheets>
                    <x:ExcelWorksheet>
                        <x:Name>MySuperSheet</x:Name>
                        <x:WorksheetOptions>
                            <x:DisplayGridlines/>
                        </x:WorksheetOptions>
                    </x:ExcelWorksheet>
                </x:ExcelWorksheets>
            </x:ExcelWorkbook>
        </xml>
    </head>
    <body>
        <table>
            <tr><td>Colum1</td><td>Column2</td></tr>
            <tr><td>Val1</td><td>Val2</td></tr>
            <tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
        </table>
    </body>
</html>

This code even allows to specify the name of the worksheet (here "MySuperSheet")

Result in Excel 2007:

enter image description here

Fewer answered 25/11, 2016 at 14:31 Comment(11)
And it's also incredible that people still answer it. :) However, I would argue that CSV is dead: CSV is a deadly simple plain streaming format, and, when I was on that project, we used it really heavily for components interaction too. I would never use HTML format because of its too-representive nature, and I couldn't ever make the users of that project export data to HTML just to make it Excel-friendly. Excel sucked at UTF-8 CSV, and I have no clue what's changed 5 years since then. But that comma-, semi-colon- and tab-delimited files is true madness, I guess.Busby
It is not incredible that people still answer your question because Google brings them here: All with the same question: How do I get stupid Excel to do what I want? Enter 3 words into Google: "excel csv utf8" and your question is the FIRST result world wide. You wrote a bestseller!Fewer
By the way: Why did you mark an answer as accepted answer which does not answer what you wanted to know ? You wrote: "When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user,"Fewer
Just googled for "excel csv utf8", and I didn't know the question is at the top. :) Regarding accepting the answer: I marked the answer as the best one just because it really answered how it could be done in Excel. Maybe I was too strict when I said "without any problems". I think that import wizards are fine for me being a programmer, but it's shame that Excel required (or still requires) to do import rather than just open.Busby
Since you've taken the time to point out how none of the other answers here cause Excel to open a CSV file correctly, it's worth mentioning that this answer doesn't do that either. The originally-accepted answer (which was deleted by a moderator) recommended exporting the data to an Excel file directly, or exporting an HTML file and allowing users to open the file in Excel. Either of those solutions are less hacky than saving an HTML file with an XLS file extension, but no answer has been proposed that actually answers the question.Dye
@Warrior: "..but no answer has been proposed that actually answers the question." This is not correct. I HAVE answered that question: At least with Excel 2007 it is NOT POSSIBLE. This is the definite answer. If Excel ignores the UTF8 BOM and interprets data as ANSI, there is no way to import for example greek or russian text into Excel via CSV by a simple double click on the file (which was the question: "without any problems for a user"). What I propose is a working solution that functions with ALL Excel versions with a double click and does not require additional steps in the menus of Excel.Fewer
Hell, like I have a choice to export as something else than CSV when it's the only button I can see in my interface... I agree that it is quite rubbish though, but we have to deal with it.Showy
Is there anyway to preserve the original excel cell lines instead of showing a blank page?Moreland
I updated my answer with an additional example to show all gridlines.Fewer
Not only does this not answer the question, it's completely wrong to state CSV is "legacy" and "dead". Just because you doesn't see the use for it, does not mean it's dead. Also, you clearly don't understand it's history or how widely used it is today. This is better than XML (due to smaller packet sizes) and the issue here is how Microsoft simply does not follow a standard for the UTF8 format. The correct answer would be to use open office and not Microsoft as Microsoft has never been good at encoding in the first place.Pappas
You are partly correct. I updated my answer. It is not the fault of CSV itself. It is the fault of Microsoft. But the quesion was about Microsoft Excel so you surely do not answer the original question if you recommend to use Open Office (which has other bugs which may even be worse). You surely will not write to the recipients of your emails: "Please install Open Office to see the attached CSV file correctly". There must be better solutions. And my answer is a far better solution than asking people to install Open Office just to open an CSV file.Fewer
B
60

We have used this workaround:

  1. Convert CSV to UTF-16 LE
  2. Insert BOM at beginning of file
  3. Use tab as field separator
Barrelhouse answered 26/1, 2012 at 10:11 Comment(6)
Thanks! I tried all the other answers in this thread, but converting to UTF-8 simply didn't work. When i tried UTF-16 with BOM it worked instantly.Purkey
For Excel for Mac 2011 I was successful with UTF-16 little endian csv filesSemele
Thanks, this solution is excellent for giving my users a unicode csv that they can open in Excel.Highbrow
can you explain more exactly how to convert to UTF-16 LE and which BOM to use?Etalon
This answer follows exactly the same steps, in Python and FastAPI, in order to achieve the same goal. Future readers might find it helpful.Pullover
Thanks for your solution! It's the only solustion where TSV (tab separated values) file can be edited in any text editor and opened in Excel without additional steps, just by double click on the file. I've converted my file to UTF-16 LE with BOM by NotePad++ and added BOM using the same app (by the code \u200b)Furie
R
26

Had the same problems with PHP-generated CSV files. Excel ignored the BOM when the Separator was defined via "sep=,\n" at the beginning of the content (but of course after the BOM).

So adding a BOM ("\xEF\xBB\xBF") at the beginning of the content and setting the semicolon as separator via fputcsv($fh, $data_array, ";"); does the trick.

Realty answered 22/10, 2013 at 11:2 Comment(1)
Best answer for me, thanks!Septilateral
S
25

You can convert .csv file to UTF-8 with BOM via Notepad++:

  1. Open the file in Notepad++.
  2. Go to menu EncodingConvert to UTF-8-BOM.
  3. Go to menu FileSave.
  4. Close Notepad++.
  5. Open the file in Excel .

Worked in Microsoft Excel 2013 (15.0.5093.1000) MSO (15.0.5101.1000) 64-bit from Microsoft Office Professional Plus 2013 on Windows 8.1 with locale for non-Unicode programs set to "German (Germany)".

Sacci answered 20/9, 2018 at 8:15 Comment(3)
Did what you said, didn't work at all. Converting/encoding, setting character sets, nothing worked in my case. Import file to google sheets, download as csv, worked flawlessly. Your welcome from 2045Maier
I had to convert it with Encoding -> Cinvert ti UTF-8-BOM Excel version 2016Wetzel
It worked well, only I had to save with UTF-8 BOM. FInally, a way to work around.Harned
B
19

Old question but heck, the simplest solution is:

  1. Open CSV in Notepad
  2. Save As -> select the right encoding
  3. Open the new file
Beetle answered 29/4, 2014 at 13:39 Comment(5)
Which Excel version do you use?Busby
Brilliant, simply works and solves the problem (at least for me)Liftoff
Nope, because the problem is to have Excel open the .csv file as a csv file. I mean, I have a process where I generate csv files. The user then would only be required to open the files. Here he/she is required to do more than that.Vervet
@Vervet if you generate files on a regular basis this isn't the best solution. But if it's a one-off, this answer is perfect. It works because Notepad will put the BOM at the beginning of a UTF-8 or UTF-16 file.Swimmingly
Works like a charm. ((:Mafaldamafeking
A
14

I have had the same issue in the past (how to produce files that Excel can read, and other tools can also read). I was using TSV rather than CSV, but the same problem with encodings came up.

I failed to find any way to get Excel to recognize UTF-8 automatically, and I was not willing/able to inflict on the consumers of the files complicated instructions how to open them. So I encoded them as UTF-16le (with a BOM) instead of UTF-8. Twice the size, but Excel can recognize the encoding. And they compress well, so the size rarely (but sadly not never) matters.

Analogous answered 9/7, 2013 at 9:18 Comment(4)
It's an old question, but I'm happy that people still answer it. Thank you. :)Busby
@LyubomyrShaydariv: since I answered this question, a colleague told me that the latest Excel does identify UTF-8 CSV files provided they have an initial BOM. So in a few years, when everyone in the world (or anyway, everyone we deliver the files to) is on that version of Excel or better, I might be able to change my encoding :-)Analogous
Strange, for me it does not work. Turns the characters into chinese symbols... (they are supposed to be hebrew)Vervet
I had chinese characters in Excel when I saved in "UTF-16 BE with BOM". When I tried "UTF-16 LE with BOM" it worked! I used Sublime Text for testing.Bedfordshire
M
11

As I posted on http://thinkinginsoftware.blogspot.com/2017/12/correctly-generate-csv-that-excel-can.html:

Tell the software developer in charge of generating the CSV to correct it. As a quick workaround you can use gsed to insert the UTF-8 BOM at the beginning of the string:

gsed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' file.csv

This command inserts the UTF-4 BOM if not present. Therefore it is an idempotent command. Now you should be able to double click the file and open it in Excel.

Matthiew answered 8/12, 2017 at 17:40 Comment(6)
Thanks this fixed it. I have just converted the csv ';' separated to UTF-BOM in Notepad++ and know excels opens it correctly.Glaab
Exactly what was needed! Here is a pointer to how to implement this in PHP #25686691Displease
This is the workaround for excel which could not recognize utf chareacters in csv files. So this correction to fix excel issue.Ewens
This worked for Excel 2007 Greek Version, with comma seperators. Thank you.Clack
Changed encoding to 'UTF-16le' in fs.createWriteStream(fileName,{encoding: 'UTF-16le'}) and excel opened the CSV file correctlyBasenji
This is absolutely the best answer. I write my csv to a std::ofstream. So if I first: file << "\xef\xbb\xbf";. I can then follow with utf8 data and it opens fine in excel 2007. Thanks!Hourly
A
6

In php you just prepend $bom to your $csv_string:

$bom = sprintf( "%c%c%c", 239, 187, 191); // EF BB BF
file_put_contents( $file_name, $bom . $csv_string );

Tested with MS Excel 2016, php 7.2.4

Anoxemia answered 25/12, 2019 at 11:57 Comment(0)
S
5

A truly amazing list of answers, but since one pretty good one is still missing, I'll mention it here: open the csv file with google sheets and save it back to your local computer as an excel file.

In contrast to Microsoft, Google has managed to support UTF-8 csv files so it just works to open the file there. And the export to excel format also just works. So even though this may not be the preferred solution for all, it is pretty fail safe and the number of clicks is not as high as it may sound, especially when you're already logged into google anyway.

Sightly answered 8/4, 2017 at 19:26 Comment(0)
C
4

Simple vba macro for opening utf-8 text and csv files

Sub OpenTextFile()

   filetoopen = Application.GetOpenFilename("Text Files (*.txt;*.csv), *.txt;*.csv")
   If filetoopen = Null Or filetoopen = Empty Then Exit Sub

   Workbooks.OpenText Filename:=filetoopen, _
   Origin:=65001, DataType:=xlDelimited, Comma:=True

End Sub

Origin:=65001 is UTF-8. Comma:True for .csv files distributed in colums

Save it in Personal.xlsb to have it always available. Personalise excel toolbar adding a macro call button and open files from there. You can add more formating to the macro, like column autofit , alignment,etc.

Coprolite answered 19/3, 2012 at 15:2 Comment(2)
Thanks. Unfortunately, this is the suggestion for really advanced Excel users, and what I'm aiming for, asking this question, is making Excel do it itself automatically.Busby
I tried to use this code snippet but it doesn't work for me. I use Excel 2013.Palestrina
S
3

This is my working solution:

vbFILEOPEN = "your_utf8_file.csv"
Workbooks.OpenText Filename:=vbFILEOPEN, DataType:=xlDelimited, Semicolon:=True, Local:=True, Origin:=65001

The key is Origin:=65001

Scyros answered 18/10, 2013 at 9:49 Comment(0)
L
3

Yes it is possible. When writing the stream creating the csv, the first thing to do is this:

myStream.Write(Encoding.UTF8.GetPreamble(), 0, Encoding.UTF8.GetPreamble().Length)
Lendlease answered 28/2, 2014 at 21:18 Comment(3)
You did not read the question! Lyubomyr Shaydariv wrote: " I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."Fewer
This is working code. If you write out UTF-8 files from .Net applications with this lead, Excel will recognize the file as UTF8.Lendlease
May be in YOUR Excel it works. But at least in Excel 2007 the BOM is ignored completely. So if you intend to write a code that works on all Excel versions your code is not usefull.Fewer
K
3

Just for help users interested on opening the file on Excel that achieve this thread like me.

I have used the wizard below and it worked fine for me, importing an UTF-8 file. Not transparent, but useful if you already have the file.

  1. Open Microsoft Excel 2007.
  2. Click on the Data menu bar option.
  3. Click on the From Text icon.
  4. Navigate to the location of the file that you want to import. Click on the filename and then click on the Import button. The Text Import Wizard - Step 1 or 3 window will now appear on the screen.
  5. Choose the file type that best describes your data - Delimited or Fixed Width.
  6. Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin.
  7. Click on the Next button to display the Text Import Wizard - Step 2 or 3 window.
  8. Place a checkmark next to the delimiter that was used in the file you wish to import into Microsoft Excel 2007. The Data preview window will show you how your data will appear based on the delimiter that you chose.
  9. Click on the Next button to display the Text Import Wizard - Step 3 of 3.
  10. Choose the appropriate data format for each column of data that you want to import. You also have the option to not import one or more columns of data if you want.
  11. Click on the Finish button to finish importing your data into Microsoft Excel 2007.

Source: https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0

Kakemono answered 13/10, 2015 at 14:38 Comment(0)
G
3

Yes, this is possible. As previously noted by multiple users, there seems to be a problem with excel reading the correct Byte Order Mark when the file is encoded in UTF-8. With UTF-16 it does not seem to have a problem, so it is endemic to UTF-8. The solution I use for this is adding the BOM, TWICE. For this I execute the following sed command twice:

sed -I '1s/^/\xef\xbb\xbf/' *.csv

, where the wildcard can be replaced with any file name. However, this leads to a mutation of the sep= at the beginning of the .csv file. The .csv file will then open normally in excel, but with an extra row with "sep=" in the first cell. The "sep=" can also be removed in the source .csv itself, but when opening the file with VBA the delimiter should be specified:

Workbooks.Open(name, Format:=6, Delimiter:=";", Local:=True)

Format 6 is the .csv format. Set Local to true, in case there are dates in the file. If Local is not set to true the dates will be Americanized, which in some cases will corrupt the .csv format.

Graptolite answered 1/12, 2016 at 13:3 Comment(1)
when running the sed command, I get: "sed: invalid option -- 'I'", I think it should be 'i'Jointed
C
3

This is not accurately addressing the question but since i stumbled across this and the above solutions didn't work for me or had requirements i couldn't meet, here is another way to add the BOM when you have access to vim:

vim -e -s +"set bomb|set encoding=utf-8|wq" filename.csv
Complete answered 1/11, 2017 at 17:10 Comment(0)
S
2

hi i'm using ruby on rails for csv generation. In our application we plan to go for the multi language(I18n) and we faced an issue while viewing I18n content in the CSV file of windows excel.

Was fine with Linux (Ubuntu) and mac.

We identified that windows excel need to be imported the data again to view the actual data. While import we will get more options to choose character set.

But this can’t be educated for each and every user, so solution we looking for is to open just by double click.

Then we identified the way of showing data by open mode and bom in windows excel with the help of aghuddleston gist. Added at reference.

Example I18n content

In Mac and Linux

Swedish : Förnamn English : First name

In Windows

Swedish : Förnamn English : First name

def user_information_report(report_file_path, user_id)
    user = User.find(user_id)
    I18n.locale = user.current_lang
    open_mode = "w+:UTF-16LE:UTF-8"
    bom = "\xEF\xBB\xBF"
    body user, open_mode, bom
  end

def headers
    headers = [
        "ID", "SDN ID",
        I18n.t('sys_first_name'), I18n.t('sys_last_name'), I18n.t('sys_dob'),
        I18n.t('sys_gender'), I18n.t('sys_email'), I18n.t('sys_address'),
        I18n.t('sys_city'), I18n.t('sys_state'), I18n.t('sys_zip'),
        I18n.t('sys_phone_number')
    ]
  end

def body tenant, open_mode, bom
    File.open(report_file_path, open_mode) do |f|
      csv_file = CSV.generate(col_sep: "\t") do |csv|
        csv << headers
        tenant.patients.find_each(batch_size: 10) do |patient|
          csv <<  [
              patient.id, patient.patientid,
              patient.first_name, patient.last_name, "#{patient.dob}",
              "#{translate_gender(patient.gender)}", patient.email, "#{patient.address_1.to_s} #{patient.address_2.to_s}",
              "#{patient.city}", "#{patient.state}",  "#{patient.zip}",
              "#{patient.phone_number}"
          ]
        end
      end
      f.write bom
      f.write(csv_file)
    end
  end

Important things to note here is open mode and bom

open_mode = "w+:UTF-16LE:UTF-8"

bom = "\xEF\xBB\xBF"

Before writing the CSV insert BOM

f.write bom

f.write(csv_file)

Windows and Mac

File can be opened directly by double clicking.

Linux (ubuntu)

While opening a file ask for the separator options -> choose “TAB” enter image description here

Schrader answered 15/8, 2018 at 12:27 Comment(0)
A
2

In Python, use encoding=utf-8-sig which is Python's name for UTF-8 with BOM. Just utf-8 will not get picked up by Excel or other Microsoft software. From https://docs.python.org/3/library/codecs.html:

To increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python calls "utf-8-sig")

See also What is the difference between utf-8 and utf-8-sig?

Using the standard python csv lib, this would be:

with open('some.csv', 'w', newline='', encoding='utf-8-sig') as f:
    writer = csv.writer(f)
    writer.writerows(someiterable)

It also works with other libs such as pandas:

df.to_csv('some.csv', encoding='utf-8-sig')
Attractive answered 13/4, 2023 at 23:52 Comment(0)
A
1
  1. Download & install LibreOffice Calc
  2. Open the csv file of your choice in LibreOffice Calc
  3. Thank the heavens that an import text wizard shows up...
  4. ...select your delimiter and character encoding options
  5. Select the resulting data in Calc and copy paste to Excel
Aristocratic answered 6/11, 2017 at 2:40 Comment(0)
S
1

I faced the same problem a few days ago, and could not find any solution because I cannot use the import from csv feature because it makes everything to be styled as string.

My solution was to first open the file with notpad++ and change the encode to ASCII. Then just opened the file in excel and it worked as expected.

Sculpin answered 24/7, 2019 at 18:37 Comment(0)
I
1

Working solution for office 365

  • save in UTF-16 (no LE, BE)
  • use separator \t

Code in PHP

$header = ['číslo', 'vytvořeno', 'ěščřžýáíé'];
$fileName = 'excel365.csv';
$fp = fopen($fileName, 'w');
fputcsv($fp, $header, "\t");
fclose($fp);

$handle = fopen($fileName, "r");
$contents = fread($handle, filesize($fileName));
$contents = iconv('UTF-8', 'UTF-16', $contents);
fclose($handle);

$handle = fopen($fileName, "w");
fwrite($handle, $contents);
fclose($handle);
Intrinsic answered 23/4, 2020 at 6:6 Comment(1)
What do you mean by no LE or BE. It must be one of them right?Undulatory
C
1

Found a solution for ASP.NET Core to download CSV's as UTF8 with POM:

byte[] csvBytes = Encoding.Default.GetBytes(csvString);
UTF8Encoding utf8 = new UTF8Encoding(true);
byte[] bom = utf8.GetPreamble();
var result = bom.Concat(csvBytes).ToArray();
return new FileContentResult(result, MediaTypeHeaderValue.Parse("text/csv; charset=utf-8"));

Excel is recognizes the downloaded CSV file than as UTF8.

Chaudoin answered 23/8, 2022 at 12:47 Comment(1)
works like a charm, nice 🦾🦾🦾🦾🦾🦾🦾🦾🦾🦾Hampton
E
1

15 years later, I finally found the solution: write CSV in UTF-16-LE without BOM, and it will work on Excel 2007+.

Example with Python:

import csv
with open('test.csv', "w", encoding="utf-16-le", newline='') as csvfile:
    w = csv.writer(csvfile, delimiter=';')
    w.writerow(["abc", "def"])
    w.writerow(["été", "hiver"])

The output CSV file can be opened directly with Excel, and:

  • (1) no problems with the accents
  • (2) the delimiters will be automatically parsed into multiple columns

Note: if using encoding="utf-16" instead of utf-16-le, (1) will still be ok, but not (2).

Expurgate answered 27/11, 2023 at 22:59 Comment(5)
Thank you for posting this. It proved helpful when writing this answer.Pullover
Thanks @Chris. Do you confirm you get the same behaviour for you? i.e. UTF-16-LE without BOM solves it all?Expurgate
For Python's built-in csv module, yes, the behaviour is the same, regardless of using utf-16 or utf-16-le encoding (both without BOM).Pullover
@Pullover for Python built-in csv, as far as I remember, everything works, including utf8, etc. (Do you have problems with Python csv module with utf8?) My post specifically refers to Excel for which utf8 is problematic. Do you confirm too?Expurgate
That is what I'm referring to (Excel). When using Pandas.DataFrame.to_csv() instead, BOM seems to be required, when trying to open the generated CSV file in Excel. Please have a look at the linked answer above for more details.Pullover
B
0

This is an old question but I've just encountered had a similar problem and the solution may help others:

Had the same issue where writing out CSV text data to a file, then opening the resulting .csv in Excel shifts all the text into a single column. After having a read of the above answers I tried the following, which seems to sort the problem out.

Apply an encoding of UTF-8 when you create your StreamWriter. That's it.

Example:

using (StreamWriter output = new StreamWriter(outputFileName, false, Encoding.UTF8, 2 << 22)) {
   /* ... do stuff .... */
   output.Close();
}
Blakely answered 25/3, 2013 at 13:41 Comment(11)
@elmue care to elaborate a little? Surely outputting CSV using the correct encoding to start with ensures there's no compatibility issues with Excel later in the workflow?Blakely
The code is wrong because you do not need output.Close() if you have a using() statement. Apart from that Excel CSV inport is very primitive. I would not use it at all. If you want to import in Excel use a HTML table instead and open it in Excel.Fewer
Thanks for clarifying @Fewer - unfortunately I've run into a number of scenarios (including files across network drives) where a close is required explicitly as well as the disposal. Not found a reason why as I think I remember seeing a call to Close() in the Dispose stack but there we go. You're also incorrect in your statement about CSV import being primitive as it does not need the inefficient HTML approach you've suggested. In fact creating additional programmatic steps to take plain-text data, transform into HTML then pull into Excel seems counter intuitiveBlakely
continued... However, different scenarios have different requirements and my example correctly shows how to specify the encoding as the OP requestedBlakely
Please read the original question again: "I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that." I tested the same and come to the same result: Excel does NOT recognize UTf8 BOM. Try it! Write a CSV file and put greek or russian charcters into it. Then open that in Excel and you will get garbarge. So your answer does not solve anything.Fewer
CSV is primitive because it depends on your settings in control panel if comma or semicolon is used as separator which is the most stupid design. A file with comma is detected correctly on some computers but on other computers not. If the data itself contains commas you must quote them. And UTF8 does NOT work. All these problems do not appear loading a HTML table into Excel.Fewer
The programmatic steps to create an HTML table are ridiculous. Save the following into a file with the extension XLS and save it as UTF8 with BOM and open it in Excel: <table><tr><td>Colum1</td><td>Column2</td></tr> <tr><td>Авиабилет</td><td>Tλληνικ</td></tr></table>Fewer
I have read the OP - I've supplied a way to create the CSVs via Excel that solve the problem to start with, and also open forcing the specific encoding the OP has asked for. Seems to work just fine with Excel 2013 and .NET 4 so I'm guessing there's a version difference in the Office type libraries between our experiments? The HTML approach is inefficient and I'm still surprised you're trying to extol it's virtues - what you're describing as plaint text anomalies is down to the encoding. Try importing as plain text in a simple using {} and save as a new file forcing the encoding. It works.Blakely
As I already said your approach does not work in Excel 2007 and as you never know which Excel versions your users use, it is not a generic solution.Fewer
Seemed to work fine across 2007 and 2013 so not sure where you're going wrongBlakely
An UTF8 BOM definitely does NOT work with Excel 2007. It is ignored.Fewer
J
0

If you want to make it fully automatic, one click, or to load automatically into Excel from say a web page, but can't generate proper Excel files, then I would suggest looking at SYLK format as an alternative. OK it is not as simple as CSV but it is text based and very easy to implement and it supports UTF-8 with no issues.

I wrote a PHP class that receives the data and outputs a SYLK file which will open directly in Excel by just clicking the file (or will auto-launch Excel if you write the file to a web page with the correct mime type. You can even add formatting (like bold, format numbers in particular ways etc) and change column sizes, or auto size columns to the text in the columns and all in all the code is probably not more than about 100 lines.

It is dead easy to reverse engineer SYLK by creating a simple spreadsheet and saving as SYLK and then reading it with a text editor. The first block are headers and standard number formats that you will recognise (which you just regurgitate in every file you create), then the data is simply an X/Y coordinate and a value.

Jsandye answered 10/5, 2016 at 10:16 Comment(0)
D
0

I am generating csv files from a simple C# application and had the same problem. My solution was to ensure the file is written with UTF8 encoding, like so:

// Use UTF8 encoding so that Excel is ok with accents and such.
using (StreamWriter writer = new StreamWriter(path, false, Encoding.UTF8))
{
    SaveCSV(writer);
}

I originally had the following code, with which accents look fine in Notepad++ but were getting mangled in Excel:

using (StreamWriter writer = new StreamWriter(path))
{
    SaveCSV(writer);
}

Your mileage may vary - I'm using .NET 4 and Excel from Office 365.

Declension answered 14/8, 2018 at 21:21 Comment(0)
M
0

I tried everything I could find on this thread and similar, nothing worked fully. However, importing to google sheets and simply downloading as csv worked like a charm. Try it out if you come to my frustration point.

Maier answered 20/12, 2018 at 20:6 Comment(0)
B
0

It's March 2022, and it seems we cannot use both a BOM and the sep=... line. Adding the sep=\t or similar, makes Excel ignore the BOM.

Using a semicolon seems to be a default Excel understands, in which case we can skip the sep=... line and it works.

This is Microsoft 365 with Excel version 2110 build 14527.20276.

Bueno answered 9/3, 2022 at 13:27 Comment(0)
C
0

Just sharing a comprehensive function that might make your life easier working with CSV files.... please note last function argument in relation to this topic

function array2csv($data, $file = '', $download = true, $mode = 'w+', $delimiter = ',', $enclosure = '"', $escape_char = "\\", $addUnicodeBom = false)
{
    $return = false;

    if ($file == '') {
        $f = fopen('php://memory', 'r+');
    } else {
        $f = fopen($file, $mode);
    }

    if ($addUnicodeBom) {
        $utf8_with_bom = chr(239) . chr(187) . chr(191);
        fwrite($f, $utf8_with_bom);
    }


    foreach ($data as $line => $item) {

        fputcsv($f, $item, $delimiter, $enclosure, $escape_char);
    }

    rewind($f);

    if ($download == true) {
        $return = stream_get_contents($f);
    } else {
        $return = true;
    }

    return $return;
}
Cuirassier answered 3/11, 2022 at 9:16 Comment(0)
H
0

Drastic change I made: instead of providing .csv to my users I provide .xlsx :)

Since I was programmatically generating the CSV file and since only humans manipulate those files, there is no reason to keep using the simple format .csv (more appropriate to be parsed). Note .xlsx files are widely supported (it's not only in Excel).

I did not have to change all the logic, I just took my previous CSV generation script, and added an Excel library to convert the CSV to XLSX in a few lines. You should be able to do so no matter the language :)

Ideally I would have preferred to go without Microsoft stuff, but sometimes you just sit on the "standard".

Hebdomadary answered 3/10, 2023 at 10:14 Comment(0)
M
0

For those that can afford to convert a few files manually, there are some answers here mentioning on how to convert them with Notepad++, but I wanted to add the solution with VS Code (Visual Studio Code / VSCode):

  • Open the CSV file in VS Code
  • In the bottom bar, click in the encoding (in my case, is the 4th from the right: "UTF-8" , "LF", "Plain Text" , notifications bell). If you hover over it, it will show "Select Encoding". If you can't find it, the alternative is to click on VS Code "Help" menu (top bar) -> "Show All Commands" -> type "Change File Encoding"
  • It will open a list of actions in the top bar. Click "Save with Encoding"
  • It will open a list of possible encodings. Click in the one you want, and you are good to go, open it in Excel and display the correct characters there. In my case, I selected "UTF-8 with BOM", which worked in Excel 365 ("Excel for Microsoft 365 MSO (Version 2311)")
Midwest answered 6/12, 2023 at 14:18 Comment(0)
I
-1

First save the Excel spreadsheet as Unicode text. Open the TXT file using Internet explorer and click "Save as" TXT Encoding - choose the appropriate encoding, i.e. for Win Cyrillic 1251

Infringe answered 6/12, 2011 at 20:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.