Rails importing CSV fails due to mal-formation
Asked Answered
C

2

3

I get a CSV:MalFormedCSVError when I try to import a file using the following code:

  def import_csv(filename, model)
    CSV.foreach(filename, :headers => true) do |row|
      item = {}
      row.to_hash.each_pair do |k,v|
          item.merge!({k.downcase => v})
      end
        model.create!(item)
    end
  end

The csv files are HUGE, so is there a way I can just log the bad formatted lines and CONTINUE EXECUTION with the remainder of the csv file?

Chanticleer answered 14/11, 2011 at 17:41 Comment(0)
L
3

You could try handling the file reading yourself and let CSV work on one line at a time. Something like this:

File.foreach(filename) do |line|
  begin
    CSV.parse(line) do |row|
      # Do something with row...
    end
  rescue CSV::MalformedCSVError => e
    # complain about line
  end
end

You'd have to do something with the header line yourself of course. Also, this won't work if you have embedded newlines in your CSV.

Lytle answered 14/11, 2011 at 19:4 Comment(3)
I can't speak authoritatively, but my understanding is that this approach can backfire in cases where a field contains \n, since File will treat that as the start of a new line, whereas the built in CSV library knows how to handle that appropriately. Maybe somebody else can speak to this...Beezer
@toasterlovin Right, that can be a problem. However, in this case, you can't let CSV do the work because the CSV file in question has some malformed records and it seems that the semi-broken CSV file is keeping CSV.foreach from working.Lytle
I was playing around with this tonight, alternative approach forthcoming.Beezer
B
2

One problem with using File to manually go through each line in the file is that CSV files can contain fields with \n (newline character) in them. File will take that to indicate a newline and you will end up trying to parse a partial row.

Here is an another approach that might work for you:

@csv = CSV.new('path/to/file.csv')

loop do
  begin
    row = @csv.shift
    break unless row
    # do stuff
  rescue CSV::MalformedCSVError => error
    # handle the error
    next
  end
end

The main downside that I see with this approach is that you don't have access to the CSV row string when handling the error, just the CSV::MalformedCSVError itself.

Beezer answered 26/4, 2017 at 6:22 Comment(3)
this one will load the whole file into the memory right?, any alternative that have same behavior as foreach?Tawny
@Tawny It's been a while, but from what I recall, the shift method is what the CSV library uses under the hood to read individual rows from the CSV file for both the methods that do load the entire file into memory as well as the methods that do not read the entire file into memory, so this should also avoid loading the whole file into memory unless you save references to the row variable in whatever you do in the loop.Beezer
yeah it was confirmed here ruby-doc.org/stdlib-2.6.1/libdoc/csv/rdoc/… btw why I cannot tag people?Tawny

© 2022 - 2024 — McMap. All rights reserved.