Ruby: How to generate CSV files that has Excel-friendly encoding
Asked Answered
J

5

25

I am generating CSV files that needs to be opened and reviewed in Excel once they have been generated. It seems that Excel requires a different encoding than UTF-8.

Here is my config and generation code:

csv_config = {col_sep: ";", 
              row_sep: "\n", 
              encoding: Encoding::UTF_8
             }

csv_string = CSV.generate(csv_config) do |csv|
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

When opening this in Excel, the special characters are not being displayed properly:

Text a  Text b  Text æ Text ø Text å

Any idea how to ensure proper encoding?

Jawbreaker answered 21/5, 2015 at 8:16 Comment(5)
Try putting # encoding: UTF-8 as your Ruby file's first line (second if you have a hash-bang line, #!/usr/bin/env ruby). I believe you are writing in UTF-8, but the Ruby source file is taken to be encoded as US_ASCII. (With Ruby 2.0+, source encoding defaults to UTF-8)Touristy
I am using ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-darwin12.4.0] so I suppose that means that my installation is already defaulting to UTF-8.Jawbreaker
No experience with Ruby. But Excel can open semicolon delimited CSV files which are UTF-8 encoded. But the file must have a BOM at its beginning. And whether the semicolon can be used as delimiter is locale dependent. So the best approach is to use tabulator delimited CSV and those UTF-16LE encoded. This should be most locale independent.Petrology
What Excel are you using? I had no trouble getting the special characters to display in Excel 2013.Nappy
Another hint: With the axlsx-gem it is easy to create direct a xlsx-files.Halfassed
A
37

Excel understands UTF-8 CSV if it has BOM. That can be done like:

Use CSV.generate

# the argument of CSV.generate is default string
csv_string = CSV.generate("\uFEFF") do |csv|
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

Use CSV.open

filename = "/tmp/example.csv"

# Default output encoding is UTF-8
CSV.open(filename, "w") do |csv|
  csv.to_io.write "\uFEFF" # use CSV#to_io to write BOM directly 
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
Anesthetist answered 24/8, 2018 at 1:2 Comment(8)
Thanks the CSV.open solution worked perfectly for me! Slightly cleaner solution than other answers.Chowder
#<NoMethodError: undefined method `to_io' for #<StringIO:0x00000001d0c540> :(Sadducee
Thanks for also saying how with CSV.openHortenciahortensa
This works for me even when I open on excel online, thanks!Abroach
FYI: \uFEFF is BOM for UTF-16. Use \xEF\xBB\xBF for UTF-8. Here is list of BOMs for UTF encodings.Mccafferty
if it has currency symbols like '₹ 60' AND '€ 70' it will not work so please check this answer it might be helpful for this scenario https://mcmap.net/q/537846/-some-currency-symbols-not-showing-in-ms-excelPippa
this is the only thing that worked for me. I wish we could just do "CSV.open(filename, "w", encoding: "bom|utf-8")". That works while reading the file but it doesn't while writing to the output file.Purpure
Just for the record; Microsoft also says this in their docs: > You can open a CSV file encoded with UTF-8 normally if it was saved with BOM (Byte Order Mark). Otherwise, you can open it through either of the following ways. support.microsoft.com/en-us/office/…Accompanyist
C
32

The top voted answer from @joaofraga worked for me, but I found an alternative solution that also worked - no UTF-8 to ISO-8859-1 transcoding required.

From what I've read, Excel, can indeed handle UTF-8, but for some reason, it doesn't recognize it by default. But if you add a BOM to the beginning of the CSV data, this seems to cause Excel to realise that the file is UTF-8.

So, if you have a CSV like so:

csv_string = CSV.generate(csv_config) do |csv|
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

just add a BOM byte like so:

"\uFEFF" + csv_string

In my case, my controller is sending the CSV as a file, so this is what my controller looks like:

def show
  respond_to do |format|
    format.csv do
      #  add BOM to force Excel to realise this file is encoded in UTF-8, so it respects special characters
      send_data "\uFEFF" + csv_string, type: :csv, filename: "csv.csv"
    end
  end
end

I should note that UTF-8 itself does not require or recommend a BOM at all, but as I mentioned, adding it in this case seemed to nudge Excel into realising that the file was indeed UTF-8.

Convertite answered 3/11, 2017 at 15:9 Comment(1)
Don't be confused like i was, older Excel does not work (2010 and above)Thallus
G
11

You should switch the encoding to ISO-8859-1 as following:

CSV.generate(encoding: 'ISO-8859-1') { |csv|  csv << ["Text á", "Text é", "Text æ"] }

For your context, you can do this:

config = {
  col_sep: ';',
  row_sep: ';',
  encoding: 'ISO-8859-1'
}

CSV.generate(config) { |csv|  csv << ["Text á", "Text é", "Text æ"] }

I had the same issue and that encoding fixed.

Grizel answered 20/4, 2016 at 18:2 Comment(3)
The answer above worked for me, but only after I removed the col_sep and row_sep arguments. Just the encoding: 'ISO-8859-1' was all I needed. For context, the specific issue I was having was é characters appearing as éConvertite
Good catch Greg, I will update the example without the context.Grizel
yeah, sure, 8859-1 good for everything yay, try writing some japanese or arabic characters in a CSV file like that. OP asked specifically for UTF-8, so why advise him to go decades back in time?Hosanna
F
0
config = {

  encoding: 'ISO-8859-1'
}

CSV.generate(config) { |csv|  csv << ["Text á", "Text é", "Text æ"] }
Flocculate answered 20/4, 2018 at 18:47 Comment(2)
Would you like to augment your code-only answer with some explanation?Albinaalbinism
OP asked for UTF-8 specifically, not 8859-1 aka ANSIHosanna
B
0

With https://github.com/gtd/csv_builder, I had to:

In the controller action:

@output_encoding = 'UTF-8'
send_data "\uFEFF" + render_to_string(), type: :csv, filename: @filename

Atop the csv.csvbuilder template:

faster_csv.to_io.write("\uFEFF")

I don't know why I had to add the BOM twice, but it did not work with either one on its own.

Belier answered 14/5, 2021 at 20:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.