How do I retrieve a list of created IDs for bulk insert in Active Record?
Asked Answered
P

5

8

I have three models:

class Coupon < ActiveRecord::Base
  belongs_to :event
  has_many :coupon_events, :dependent => :destroy
  has_many :events, :through => :coupon_events
end 

class Event < ActiveRecord::Base
  belongs_to :event
  has_many :coupon_events, :dependent => :destroy
  has_many :coupons, :through => :coupon_events
end

class CouponEvent < ActiveRecord::Base
  belongs_to :coupon
  belongs_to :event
end

I read through a CSV file to create coupons and coupon_events. This is terribly inefficient since the records are created one at a time and result in multiple queries each including the two insert statements.

I'd like to use a single insert query like this:

coupon_string = " ('abc','AAA'), ('123','BBB')"
Coupon.connection.insert("INSERT INTO coupons (code, name) VALUES"+coupon_string)

I then need to create the second insert query for the CouponEvent model, but I need a list of the returned coupon_ids. Is there a built in method to retrieve the IDs at the time of the insert?

Presbytery answered 5/12, 2012 at 7:7 Comment(1)
If what you want is just the IDs that you got from first query then just store the returned IDs in a hash to use at the time of insert.Cum
W
-2

If you are using mysql and you are not inserting more rows in another script/process, you can get the id of the first row inserted by using last_insert_id()

    first_id = ActiveRecord::Base.connection.execute("select last_insert_id()").first[0]

And then the ids of the other records are sequentially generated.

i.e.

    data = %w(one two three)
    to_insert = "('" + data.join("'), ('") + "')"
    Model.connection.insert("INSERT INTO models(name) VALUES #{to_insert}")
    first_id = ActiveRecord::Base.connection.execute("select last_insert_id()").first[0].to_i
    hash = {}
    data.each_with_index {|d, i| hash[first_id + i] = d}
Walther answered 5/12, 2012 at 8:51 Comment(9)
Your first_id value is actually returned from #insert statement of MySQL AR adapter. At last that's what I think, but I don't use MySQL so I cannot confirm it.Supramolecular
And for what he says, that's he problem, he's making a couple of inserts at the same time and he needs the id for each one of those records, so if he's using MySQL, with last_insert_id() he will get the ID of the first record inserted, and all the other records are sequentially generated (unless some other process is also inserting records at the same time).Walther
Interesting, rorra. Since I am creating the coupons at one time, can you say with certainty that the first, last, and all IDs in between would all belong to this insert? Or if I'm creating say, 50 coupons, and the last inserted coupon_id is 25, could I be certain that the coupon IDs created with be 26-75? It seems like it, but I must be certain.Presbytery
Yes, as long as other process doesnt insert records in the table at the same time, the ids will be 26_75, but just in case write a quick test case just to be sure. This code worked fine for me in the pastWalther
To be certain I don't grab the wrong IDs, I lock the table to writes, get Coupon.last.id and the first with MBO's suggestion of retrieving it via Model.connection.insert. Thanks everyone.Presbytery
This is a poor general solution, because it is depending on the database generating sequential ids. While databases often do that, they won't do that if there are other users inserting into the same table. Locking the table just degrades performance, and I'm assuming that you were using a bulk import in the first place to improve performance. As well, there are other reasons that a db might not generate sequential ids, such as id partitioning (by range, where there are ranges of ids assigned to each db, not necessarily adjacent or even increasing, and also by even/odd)Miscellanea
Actually this is a great solution under the conditions that I commented on the first lines of my answer: "If you are using mysql and you are not inserting more rows in another script/process, you can get the id of the first row inserted by using last_insert_id()", and most important, it answers the question: "How do I retrieve a list of created IDs for bulk insert in Active Record?". You are assuming too much. Feel free to write an answer with real code and a real solution.Walther
@Walther Actually it's a poor solution generally. I think OP himself might run into hard to debug issues if he did something like this. I would ask the OP: What do you need the IDS for? if it's to manually active callbacks for the model, consider having a separate thread that queries the table to see which IDs were newly inserted, and do something with those IDS that way.Schiedam
No, its a great solution under the conditions I said, its quickly and pragmatic, feel free to prove that the solution I wrote under the conditions I wrote for the problem described, will became in a bug hard to debug. Feel free to write a better solution in a new answer, with callbacks that nobody asked about, or threads that nobody asked about. Just to be clear: "If you are using mysql and you are not inserting more rows in another script/process, you can get the id of the first row inserted by using last_insert_id()"Walther
M
3

At the moment, the best (but not ideal) solution is to bulk import using "activerecord-import". Unfortunately, that gem does not return the inserted ids, so you'd have to turn around and query to get the ids. That is, you'd bulk insert the Events models, query the db to get them all back in memory. Now you have the Event ids, so you can create the Coupons and bulk insert them. Rinse lather repeat for CouponEvents.

Compared to one round trip per Event, Coupon and CouponEvent - probably thousands of round trips for a file with thousands of rows - you are only doing 2 round trips per model - One to insert the Event, one to fetch the Events back with the ids, ditto Coupon and CouponEvent - total 6 round trips.

Miscellanea answered 3/7, 2013 at 14:10 Comment(7)
I created a version of activerecord-import that does return the inserted ids, at least for postgres, using the technique in @MBO's answer. It's here: github.com/GoodMeasuresLLC/activerecord-import , with a push request for the official version.Miscellanea
Does active record import use 1 round trip to send the queries?Schiedam
Also, are you sure your gem returns the correct IDS in a multi-threaded environment? IE 10 different threads running activerecord-import into the same table?Schiedam
yes, only one round trip - that would be the point. For the multithread thing, it is relying on the db to create the ids, so they don't have to be sequential. The autoincrement feature in db isn't broken for multithreading.Miscellanea
You can use the standard fork of activerecord-import, my PR was accepted last week. That's github.com/zdennis/activerecord-importMiscellanea
@Miscellanea is there a chance that one would get different IDs from the ones I'd saved, like in a situation with other users also writing?Bathtub
No, because transactions. ie, that's not how multiuser databases work.Miscellanea
S
2

Actually I'm not sure if this colud work (if it creates one insert query), but you can try to use #create method with array of parameters:

new_coupons = Coupon.create([
  { :code => "abc", :name => "AAA" },
  { :code => "123", :name => "BBB" }
])

CouponEvent.create([
  { :enevt_id => ..., coupon_id: ...},
  ...
])

To create parameters list for CouponEvent, you neet map returned collection of new_coupons to id's and add event_id's based on coupon codes/names (depends haw it's stored in CVS file).

UPDATE:

I checked by myself, and if first solution doesn't work (I don't simple have models without uniqueness constraints in my code, so I haven't checked), and you use PostgreSQL, you can always do something like this:

res = Coupon.connection.execute(<<-EOSQL)
  INSERT INTO coupons (code, name)
  VALUES #{values}
  RETURNING id, code
EOSQL

You need that last "Returning" clause, so you can fetch id's inserted along with code of inserted row. The you need to map resultset:

res.map {|row|
  { :coupon_id => row["id"],
    :event_id => events.find { |e| e.coupon_code == row["code"] }
  }
}

There is no standard way in SQL to return columns of inserted rows, "RETURNING" clause works in PostgreSQL only, so if you use different database, you need check documentation or insert rows one by one.

You can't also use connection.insert, as in ActiveRecord it returns only id of one inserted row, instead of all rows.

Supramolecular answered 5/12, 2012 at 8:25 Comment(3)
Alas, I'm using MySQL, but I just found that I might be able to get the last_insert_id. See dev.mysql.com/doc/refman/5.0/en/getting-unique-id.htmlPresbytery
The version using Coupon.create isn't very useful - it creates and executes one sql query per coupon.Miscellanea
But the second version of the answer with Coupon.connection.execute() is awesome, just what I needed.Miscellanea
T
2

The way to do this is to insert the records with a unique import_id value. The steps are:

  1. Add an import_id column to the table. Could be INT or VARCHAR depending on how you generate random IDs.

  2. Before the first INSERT, generate a random ID.

  3. Do the first multi-value INSERT, using the same import_id for each row.

  4. SELECT id FROM first_table WHERE import_id=<the random import ID>

  5. Generate second multi-value INSERT using the returned IDs.

Thermae answered 21/11, 2013 at 19:26 Comment(0)
M
0

connection.insert only return one id, common table expression works for me

    insert_sql = <<-SQL
      WITH inserted_ids AS (
        INSERT INTO clients (email, name) VALUES #{array.join(', ')}
        RETURNING id
      )
      SELECT * FROM inserted_ids
    SQL
    result = ActiveRecord::Base.connection.execute(insert_sql)
Mountebank answered 29/7, 2016 at 1:20 Comment(1)
RETURNING is not supported by MySQL?Despairing
W
-2

If you are using mysql and you are not inserting more rows in another script/process, you can get the id of the first row inserted by using last_insert_id()

    first_id = ActiveRecord::Base.connection.execute("select last_insert_id()").first[0]

And then the ids of the other records are sequentially generated.

i.e.

    data = %w(one two three)
    to_insert = "('" + data.join("'), ('") + "')"
    Model.connection.insert("INSERT INTO models(name) VALUES #{to_insert}")
    first_id = ActiveRecord::Base.connection.execute("select last_insert_id()").first[0].to_i
    hash = {}
    data.each_with_index {|d, i| hash[first_id + i] = d}
Walther answered 5/12, 2012 at 8:51 Comment(9)
Your first_id value is actually returned from #insert statement of MySQL AR adapter. At last that's what I think, but I don't use MySQL so I cannot confirm it.Supramolecular
And for what he says, that's he problem, he's making a couple of inserts at the same time and he needs the id for each one of those records, so if he's using MySQL, with last_insert_id() he will get the ID of the first record inserted, and all the other records are sequentially generated (unless some other process is also inserting records at the same time).Walther
Interesting, rorra. Since I am creating the coupons at one time, can you say with certainty that the first, last, and all IDs in between would all belong to this insert? Or if I'm creating say, 50 coupons, and the last inserted coupon_id is 25, could I be certain that the coupon IDs created with be 26-75? It seems like it, but I must be certain.Presbytery
Yes, as long as other process doesnt insert records in the table at the same time, the ids will be 26_75, but just in case write a quick test case just to be sure. This code worked fine for me in the pastWalther
To be certain I don't grab the wrong IDs, I lock the table to writes, get Coupon.last.id and the first with MBO's suggestion of retrieving it via Model.connection.insert. Thanks everyone.Presbytery
This is a poor general solution, because it is depending on the database generating sequential ids. While databases often do that, they won't do that if there are other users inserting into the same table. Locking the table just degrades performance, and I'm assuming that you were using a bulk import in the first place to improve performance. As well, there are other reasons that a db might not generate sequential ids, such as id partitioning (by range, where there are ranges of ids assigned to each db, not necessarily adjacent or even increasing, and also by even/odd)Miscellanea
Actually this is a great solution under the conditions that I commented on the first lines of my answer: "If you are using mysql and you are not inserting more rows in another script/process, you can get the id of the first row inserted by using last_insert_id()", and most important, it answers the question: "How do I retrieve a list of created IDs for bulk insert in Active Record?". You are assuming too much. Feel free to write an answer with real code and a real solution.Walther
@Walther Actually it's a poor solution generally. I think OP himself might run into hard to debug issues if he did something like this. I would ask the OP: What do you need the IDS for? if it's to manually active callbacks for the model, consider having a separate thread that queries the table to see which IDs were newly inserted, and do something with those IDS that way.Schiedam
No, its a great solution under the conditions I said, its quickly and pragmatic, feel free to prove that the solution I wrote under the conditions I wrote for the problem described, will became in a bug hard to debug. Feel free to write a better solution in a new answer, with callbacks that nobody asked about, or threads that nobody asked about. Just to be clear: "If you are using mysql and you are not inserting more rows in another script/process, you can get the id of the first row inserted by using last_insert_id()"Walther

© 2022 - 2024 — McMap. All rights reserved.