What is the fastest way to create mass HABTM associations in Rails?
Asked Answered
D

4

17

I have two tables, with a HABTM relationship in Rails. Something like the following:

class Foo < ActiveRecord::Base
  has_and_belongs_to_many :bars
end

class Bar < ActiveRecord::Base
  has_and_belongs_to_many :foos
end

Now I have a new Foo object, and want to mass-assign thousands of bars to it, which I've pre-loaded:

@foo = Foo.create
@bars = Bar.find_all_by_some_attribute(:a)

What's the fastest way to do this? I've tried:

@foo.bars = @bars
@foo.bars << @bars

And both run really slow, with an entry like the following for each bar:

bars_foos Columns (1.1ms) SHOW FIELDS FROM bars_foos SQL (0.6ms) INSERT INTO bars_foos (bar_id, foo_id) VALUES (100, 117200)

I looked at ar-extensions, but the import function doesn't seem to work without a model (Model.import) which precludes its use for a join table.

Do I need to write the SQL, or does Rails have a prettier way?

Dorchester answered 4/2, 2010 at 21:37 Comment(1)
really? no one? You guys jump all over the lay-ups and "best practice" questions :)Dorchester
B
10

I think your best bet performance-wise is going to be to use SQL, and bulk insert multiple rows per query. If you can build an INSERT statement that does something like:

INSERT INTO foos_bars (foo_id,bar_id) VALUES (1,1),(1,2),(1,3)....

You should be able to insert thousands of rows in a single query. I didn't try your mass_habtm method, but it seems like you could to something like:

bars = Bar.find_all_by_some_attribute(:a)
foo = Foo.create
values = bars.map {|bar| "(#{foo.id},#{bar.id})"}.join(",")
connection.execute("INSERT INTO foos_bars (foo_id, bar_id) VALUES #{values}")

Also, if you are searching Bar by "some_attribute", make sure you have that field indexed in your database.

Billman answered 8/2, 2010 at 3:20 Comment(1)
my mass_habtm just combines the queries into a single search/insert query, which probably doesn't gain that much over what you have here. I hate selecting my own, so thanks for at least giving me a viable option.Dorchester
A
7

You still might have a look at activerecord-import. It's right that it doesn't work without a model, but you could create a Model just for the import.

class FooBar < ActiveRecord::Base; end

FooBar.import [:foo_id, :bar_id], [[1,2], [1,3]]

You can wrap this in a transaction to ensure the HABTM gets fully populated, as in here:

ActiveRecord::Base.transaction do
  imported_foo = Foo.import( foo_names, foo_values )
  imported_bar = Bar.import( bar_names, bar_values )
  FooBar.import( [:foo_id, :bar_id], imported_foo.ids.zip(imported_bar.ids)
end
Abhorrence answered 13/7, 2011 at 14:18 Comment(0)
D
1

This was faster than the equivalent native rails code by a factor of 7:

class << Foo
  def mass_habtm(attr_array)
    attr_str = attr_array.map{|a| %Q{'#{a}'} }.uniq.join(",")
    self.connection.execute(%Q{insert into foos_bars (foo_id,bar_id) 
                     select distinct foos.id,bars.id from foos,bars 
                     where foos.id = #{self.id} 
                     and bars.some_attribute in (#{attr_str})})
  end
end

It seems to me that this is a straightforward enough operation that it should be supported efficiently in Rails, I would love to hear if anyone has a cleaner way.

I'm running 2.2.2, maybe it's implemented more efficiently in 3.x? and found the same on 3.0.2.

Dorchester answered 5/2, 2010 at 0:0 Comment(0)
M
-5

Honestly, has_and_belongs_to_many is a very antiquated way of doing things. You should probably look into has_many :through, which is the new way of doing join tables, and has been for quite some time.

class Foo < ActiveRecord::Base
  has_many :foobars
  has_many :bars, :through => :foobars

  def add_many_bars(bars)
    bars.each do |bar|
      self.bars << bar
    end
  end
end

class Bar < ActiveRecord::Base
  has_many :foobars
  has_many :foos, :through => :foobars
end

class FooBar < ActiveRecord::Base
  belongs_to :foo
  belongs_to :bar
end

Also, you should try running the same in production and see what kind of performance you get, as a lot of caching goes on in production that doesn't necessarily occur in development.

Money answered 6/2, 2010 at 19:24 Comment(3)
Not to be a jerk, but you in no way addressed the main question - the speed of creating the relations, other than speculating that production might be better. While caching may help, it almost certainly won't change the way the SQL is formulated. I'd also argue that habtm is a better candidate for optimizing this stuff, since has_many->through requires a model class, which means there may be callbacks in the join model.Dorchester
And I'm pretty sure your implementation is slower than the one I had presented as not being fast enough.Dorchester
Yes, but having a model on the join allows you to do other things with finders and other options that you may not have had access to. At least this way, you have another model to put your code in rather than repeating it in both models if you want to go back and forth.Money

© 2022 - 2024 — McMap. All rights reserved.