Ruby unable to parse a CSV file: CSV::MalformedCSVError (Illegal quoting in line 1.)
Asked Answered
K

11

41

Ubuntu 12.04 LTS

Ruby ruby 1.9.3dev (2011-09-23 revision 33323) [i686-linux]

Rails 3.2.9

Following is the content of my received CSV file:

"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"

However when I am trying to parse the CSV file I am getting error:

1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""} 

1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
    from (irb):22
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

Then I tried simplifying the data i.e.

"name","age","email"
"jignesh","30","[email protected]"

however still I am getting the same error:

      1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
  CSV::MalformedCSVError: Illegal quoting in line 1.
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
      from (irb):23
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

Again I tried simplifying the data like this:

name,age,email
jignesh,30,[email protected]

and it works.See the output below:

  1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
  name
  age
  email
  jignesh
  30
  [email protected]
   => nil 

But I will be receiving the CSV files having quoted data so removing quotes solution is not actually I am looking for.I am unable to figure out what is causing the error: CSV::MalformedCSVError: Illegal quoting in line 1. in my earlier examples.

I have verified that in the CSV there are no leading/trailing spaces by enabling "Show whitespace characters" and "Show Line Endings" in my text editor.Also I have verified the encoding using following.

  1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
  => #<Encoding:UTF-8> 

Note: I tried using CSV.read too but same error with that method.

Can anybody please help me getting out of the problem and make me understand where it is going wrong?

=====================

I just found following post at: http://www.ruby-forum.com/topic/448070 and tried following:

  file_data = file.read
  file_data.gsub!('"', "'")
  arr_of_arrs = CSV.parse(file_data)

  arr_of_arrs.each do |arr|
    Rails.logger.debug "=======#{arr}"
  end

and got the following output:

   =======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
    =======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]

which messed up reading the data properly as the default col_sep used is a comma character. However I tried using quote_char option like this:

  arr_of_arrs = CSV.parse(file_data, :quote_char => "'")

but it ended up the following error:

   CSV::MalformedCSVError (Illegal quoting in line 1.):

Thanks, Jignesh

Kp answered 27/5, 2013 at 12:4 Comment(6)
Used the sample data you've provided and the parsing works fine. Not getting any CSV::MalformedCSVError: Illegal quoting in line 1 error.Ashy
In my edited section the output contains following: "\xEF\xBB\xBF'date/time'".Is it creating some problem? I don't know what does it represent.Thanks.Kp
The Unicode characters at the start of the file are BOM (Byte Order Mark). You can try sub!(/^\xEF\xBB\xBF/, '') or CSV.foreach("test.csv", encoding: "bom|utf-8")Ashy
Thanks Anand I shall try using your encoding suggested solution.Meanwhile with my temporary solution in Edit section when used header_converters like: arr_of_arrs = CSV.parse(file_data, { col_sep: ";", headers: true, header_converters: [ :symbol ] }) I got following error: Encoding::UndefinedConversionError ("\xEF" from ASCII-8BIT to UTF-8).That one mentions ASCCII-8BIT as encoding.How does that encoding matters, how those BOM characters got in there? Such errors should be clearly shown in the exception thrown by library instead of finding them by chance in to_s output.Kp
The following link joelonsoftware.com/articles/Unicode.html, will help in understanding how encoding matters. As for how those BOM characters got in there, you'll need to check the origin of the CSV file received and how it was saved.Ashy
@Anand using your suggestion worked: csv_options[:encoding] = "bom|utf-8"; CSV.foreach(uploaded_file.path, csv_options) do |row| ... end; where csv_options is a Hash of options CSV.new supports.Kp
T
36
quote_chars = %w(" | ~ ^ & *)
begin
  @report = CSV.read(csv_file, headers: :first_row, quote_char: quote_chars.shift)
rescue CSV::MalformedCSVError
  quote_chars.empty? ? raise : retry 
end

it's not perfect but it works most of the time.

N.B. CSV.parse takes the same parameters as CSV.read, so either a file or data from memory can be used

Tincal answered 27/9, 2013 at 4:9 Comment(0)
N
25

Anand, thank you for the encoding suggestion. This solved the illegal quoting problem for me.

Note: If you want the iterator to skip over the header row add headers: :first_row, like so:

CSV.foreach("test.csv", encoding: "bom|utf-8", headers: :first_row)
Niobic answered 2/4, 2015 at 20:37 Comment(2)
Thanks! encoding: "bom|utf-8" is what solved my problem.Rolandrolanda
For those getting ArgumentError: unknown encoding name - bom|utf-8 with ruby 2.4+, make sure you update the csv gem to version 3 or newer (gem 'csv', '~> 3.0' in your Gemfile).Linnealinnean
L
14

I just had an issue like this and discovered that CSV does not like spaces between the col-sep and the quote character. Once I removed those everything went fine. So I had:

12,  "N",  12, "Pacific/Majuro"

but once I gsubed out the spaces using

.gsub(/,\s+\"/,',\"')

resulting in

12,"N",  12,"Pacific/Majuro"

everything went fine.

Lander answered 18/10, 2013 at 18:30 Comment(2)
Note if you want to replace spaces on both sides of quoted string within comma value...gsub(/,\s+\"/,',"').gsub(/\"\s+,/,'",')Hippopotamus
This is the second time i've had this issue and the second time i've found this answer. Unfortunately, I can't upvote it again.Antagonist
I
14

Rails 6 version, ruby 2.4+

CSV.foreach(file, liberal_parsing: true, headers: :first_row) do |row|
    // do whatever
end

https://ruby-doc.org/stdlib-2.4.0/libdoc/csv/rdoc/CSV.html

Inhumane answered 25/8, 2020 at 7:53 Comment(1)
Fantastic @mArtinko5MB! liberal_parsing: When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.Sumption
F
5

from this thread pass the option :quote_char => "|"

CSV.read(filename, :quote_char => "|")

Follower answered 16/12, 2019 at 9:37 Comment(2)
👍 this seems to work with the CSV.foreach method as wellSinuate
it works, but in conjunction with headers: true or headers: :first_row it messes up the column valuesInhumane
H
4

Add the :liberal_parsing => true argument to CSV.read and this should solve some of the issues with "illegal quoting"

Hypocorism answered 3/11, 2020 at 13:40 Comment(0)
H
2

I had a problem with the trademark character that was throwing this error.

The trademark character translates to \"! in UTF-8, so it was the open-ended quotation symbol that was throwing the error. So I did this:

.gsub!("\"!", "")

And then I tried creating my CSV object and it worked fine.

Hinkel answered 25/2, 2016 at 12:11 Comment(0)
H
0

I attempted to read the file and get a string and then parse thes tring into a CSV table, but received an exception:

CSV.read(File.read('file.csv'), headers: true)
CSV::MalformedCSVError: Unclosed quoted field on line 1794.

None of the answers provided here worked for me. In fact, the one with highest votes was taking so long to parse that eventually I terminated the execution. It most likely was raising many exceptions, and that time is costly on a large file.

Even more problematic, the error is not so helpful, since it is a large CSV file. Where exactly is line 1794? I opened up the file in LibreOffice which opened without any problems. Line 1794 was the last row of data of the csv file. So apparently the problem had to do with the end of the CSV file. I decided to inspect the contents as a string with File.read. I noticed the string ended with a carriage return character:

,\"\"\r

I decided to use chomp and remove the carriage return at the end of file. Note if $/ has not been changed from the default Ruby record separator, then chomp also removes carriage return characters (that is it will remove \n, \r, and \r\n).

CSV.parse(File.read('file.csv' ).chomp, headers: true)
 => #<CSV::Table mode:col_or_row row_count:1794>

And it worked. The problem was the \r character at the end of the file.

Halfbeak answered 23/12, 2018 at 0:14 Comment(0)
M
0

I had this problem in a line like Agricover 22040169 Access; TonnoSport

The trouble is the CSV parser is anticipating ", in the event that they seem, to completely surround the comma-delimited text.

solution use a quote person except " that i was certain would not appear in my records:

CSV.parse(file_path, headers: true, :quote_char => "|")

For more information read?

https://ruby-doc.org/stdlib-2.6.1/libdoc/csv/rdoc/CSV.html#class-CSV-label-Reading

Mariselamarish answered 2/10, 2022 at 14:14 Comment(0)
B
-1

A less common cause of this error is when the file doesn't do any field quoting, but quote_char is still set (by default it's ") and one or more fields happen to contain the character.

To disable field quoting entirely, set quote_char: nil in the parsing options.

For example, given a file /tmp/people.csv like this:

Actor,Dwayne "The Rock" Johnson,1972-05-02
Character,TV's Frank,1956-08-30

It could be parsed with this:

CSV.read('/tmp/people.csv', quote_char: nil)
Binns answered 28/2, 2021 at 21:0 Comment(1)
As a quick follow-up, using liberal_parsing: true also works. Perhaps the "safer" method, of the two options, because it should allow for quoted values using " in the future.Farber
B
-4

Try this hint:

  1. Open your CSV file in a text editor
  2. Select the whole file and copy it
  3. Open a new text file
  4. Paste the CSV data into the new file and Save the new file
  5. Import your new CSV file
Broz answered 10/9, 2013 at 8:16 Comment(1)
csv file could be in MBs, which cannot be opened like thatBunche

© 2022 - 2024 — McMap. All rights reserved.