Ruby on Rails - Import Data from a CSV file
Asked Answered
C

12

228

I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.

This is my table:

create_table "mouldings", :force => true do |t|
  t.string   "suppliers_code"
  t.datetime "created_at"
  t.datetime "updated_at"
  t.string   "name"
  t.integer  "supplier_id"
  t.decimal  "length",         :precision => 3, :scale => 2
  t.decimal  "cost",           :precision => 4, :scale => 2
  t.integer  "width"
  t.integer  "depth"
end

Can you give me some code to show me the best way to do this, thanks.

Chainman answered 10/12, 2010 at 16:8 Comment(0)
P
420
require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end
Pimbley answered 10/12, 2010 at 16:15 Comment(12)
You can put it in a Rake task, or in a controller action, or anywhere you like....Pimbley
It worked perfectly. However I have a beginner-level question - when I tried to browse described methods in Ruby and Rails API documentation I was unable to find them on place (I looked on official Ruby and Rails sites, API docs). E.g. I couldn't find what object returns CSV.parse(), I didn't find to_hash() and with_indifferent_access() methods... Maybe I looked in wrong place or missed some basic principle on how to traverse Ruby & Rails API docs. Can anyone share the best practice how to read Ruby API docs?Consolata
is there a way to do this without loading the whole file into memory?Baily
@daveatflow: yes, see my answer below, which reads in the file one line at a time.Silique
If it's not a csv file, yuo can use something like File.open('file').readlines.each {|line| key,value=line.split(/[$]/); record.create!(key: key, value: value }Pachisi
@lokeshjain2008, it refers to the OP's model.Alongside
and for those who get the encoding error, this is how I fixed it. csv_text = File.read('...').force_encoding("UTF-8")Twilley
csv = CSV.parse(csv_text, :headers => true, :col_sep => ";") for other separator.Rhiannonrhianon
This method is inefficient! On huge CSV files the ram usage skyrockets. the one below is better.Pipkin
@unmircea That is definitely true. Also, you are parsing extremely large files in Ruby?Pimbley
@Pimbley Yes. Trying to import some huge CSVs into Active Record Check out this comparison between all methods. dalibornasevic.com/posts/…Pipkin
With many rows I get: thread failed: inotify event queue has overflowedAraucanian
S
227

Simpler version of yfeldblum's answer, that is simpler and works well also with large files:

require 'csv'    

CSV.foreach(filename, headers: true) do |row|
  Moulding.create!(row.to_hash)
end

No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.

It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.

Silique answered 2/5, 2012 at 21:3 Comment(6)
This is better for managing large file sizes right? Does it read in one line at a time?Lowney
@Simon: indeed. It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.Silique
I have this error, do you know why?: ActiveModel::UnknownAttributeError: unknown attribute 'siren;nom_ent;adresse;complement_adresse;cp_ville;pays;region;departement;activite;date;nb_salaries;nom;prenom;civilite;adr_mail;libele_acti;categorie;tel' for TransactionNeuter
@AlphaNico Create a question with your problem. That error is unrelated to this, your Model objects seem out of sync.Pipkin
In this case, how do you write TestCases for this?Depraved
when I do this, I get Errno:ENOENT - it says no such file or directory but I get the CSV text in the error log, so it reads the file. But it specifically fails in the line CSV.foreach(csv_text, headers: true) do |row| this was the code: require 'csv' csv_text = File.read('app/assets/csv/plant_table.csv') CSV.foreach(csv_text, headers: true) do |row| Test.create!(row.to_hash) endSahaptin
K
14

The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.

  require 'smarter_csv'
  options = {}
  SmarterCSV.process('input_file.csv', options) do |chunk|
    chunk.each do |data_hash|
      Moulding.create!( data_hash )
    end
  end

You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.

See also: https://github.com/tilo/smarter_csv

Knorr answered 14/12, 2013 at 20:11 Comment(6)
As the CSV class is included, I feel it's better to use it instead of adding or installing an additional gem. Granted, you didn't propose that a new gem be added to the application. It's so easy to add a series of individual gems, each for a specific purpose and before you know it your application has excessive dependencies. (I find myself consciously avoiding the addition of any gems. In my shop we need to justify the addition to our teammates.)Rustyrut
@Rustyrut it's also pretty easy to add a series of individual methods, each for a specific purpose and before you know it your application has excessive logic that you have to maintain. If a gem works, is well maintained, and uses little resources or can be quarantined to the relevant environments (i.e. Staging for production tasks) it seems to me always a better option to use the gem. Ruby and Rails are all about writing less code.Coheman
I have the following error, do you know why? ActiveModel::UnknownAttributeError: unknown attribute 'siren;nom_ent;adresse;complement_adresse;cp_ville;pays;region;departement;activite;date;nb_salaries;nom;prenom;civilite;adr_mail;libele_acti;categorie;tel' for TransactionNeuter
I tried this on a rake task, console returns: rake aborted! NoMethodError: undefined method `close' for nil:NilClass #42515543Actinomycosis
looks like someone already replied - just prefix your CSV file location with ./Knorr
@Rustyrut chunking the CSV processing, improving the speed and saving memory might be a good justification for adding a new gem ;)Knorr
E
5

You might try Upsert:

require 'upsert' # add this to your Gemfile
require 'csv'    

u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
  selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
  setter = row.to_hash
  u.row selector, setter
end

If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.

Euphonic answered 7/3, 2013 at 21:5 Comment(0)
E
4

This can help. It has code examples too:

http://csv-mapper.rubyforge.org/

Or for a rake task for doing the same:

http://erikonrails.snowedin.net/?p=212

Erichericha answered 10/12, 2010 at 16:14 Comment(1)
erikonrails.snowedin.net/?p=212 is broken, please, i opened an issue for doing with rake task here #42515543Actinomycosis
A
3

It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,

require 'csv'

namespace :lan do
  desc 'Seed initial languages data with language & code'
  task init_data: :environment do
    puts '>>> Initializing Languages Data Table'
    ActiveRecord::Base.transaction do
      csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
      csv_str = File.read(csv_path)
      csv = CSV.new(csv_str).to_a
      csv.each do |lan_set|
        lan_code = lan_set[0]
        lan_str = lan_set[1]
        Language.create!(language: lan_str, code: lan_code)
        print '.'
      end
    end
    puts ''
    puts '>>> Languages Database Table Initialization Completed'
  end
end

Snippet below is a partial of languages.csv file,

aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...
Aero answered 3/11, 2016 at 3:27 Comment(0)
O
1

The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.

desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
  lines = File.new(args[:filename], "r:ISO-8859-1").readlines
  header = lines.shift.strip
  keys = header.split(',')
  lines.each do |line|
    values = line.strip.split(',')
    attributes = Hash[keys.zip values]
    Module.const_get(args[:model]).create(attributes)
  end
end

After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]

Ocher answered 24/3, 2017 at 7:26 Comment(0)
N
1

I know it's old question but it still in first 10 links in google.

It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.

It's better (and significantly faster) to use batch insert.

INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
    ('s1', 'supplier1', 1.111), 
    ('s2', 'supplier2', '2.222')

You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended) or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows

refer to gem docs for details

Nady answered 31/12, 2017 at 1:15 Comment(0)
G
0

Use this gem: https://rubygems.org/gems/active_record_importer

class Moulding < ActiveRecord::Base
  acts_as_importable
end

Then you may now use:

Moulding.import!(file: File.open(PATH_TO_FILE))

Just be sure to that your headers match the column names of your table

Gesualdo answered 28/2, 2017 at 7:53 Comment(0)
M
0

The following module can be extended on any model and it will import the data according to the column headers defined in the CSV.

Note:

  • This is a great internal tool, for customer use I would recommend adding safeguards and sanitization
  • The column names in the CSV must be exactly like the DB schema or it won't work
  • It can be further improved by using the table name to get the headers vs defining them in the file

Create a file named "csv_importer.rb" in your models/concerns folder

module CsvImporter
  extend ActiveSupport::Concern  
  require 'csv'
  
  def convert_csv_to_book_attributes(csv_path)
    csv_rows = CSV.open(csv_path).each.to_a.compact
    columns = csv_rows[0].map(&:strip).map(&:to_sym)
    csv_rows.shift
    
    return columns, csv_rows
  end
  
  def import_by_csv(csv_path)
    columns, attributes_array = convert_csv_to_book_attributes(csv_path)
    
    message = ""
    begin
      self.import columns, attributes_array, validate: false
      message = "Import Successful."
    rescue => e
      message = e.message
    end
    
    return message
  end
end

Add extend CsvImporter to whichever model you would like to extend this functionality to.

In your controller you can have an action like the following to utilize this functionality:

def import_file
   model_name = params[:table_name].singularize.camelize.constantize
   csv = params[:file].path
   @message = model_name.import_by_csv(csv)
end
Mensal answered 26/5, 2021 at 19:13 Comment(0)
P
-2

It's better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF

Perales answered 31/12, 2014 at 7:16 Comment(1)
What is your proposed solution?Rustyrut
V
-3

If you want to Use SmartCSV

all_data = SmarterCSV.process(
             params[:file].tempfile, 
             { 
               :col_sep => "\t", 
               :row_sep => "\n" 
             }
           )

This represents tab delimited data in each row "\t" with rows separated by new lines "\n"

Velours answered 23/4, 2014 at 1:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.