Index the results of a method in ElasticSearch (Tire + ActiveRecord)
Asked Answered
B

1

12

I'm indexing a data set for elasticsearch using Tire and ActiveRecord. I have an Artist model, which has_many :images. How can I index a method of the Artist model which returns a specific image? Or alternatively reference a method of the associated model? My desired Artist result will include the paths for the primary Image associated with the Artist (both the original and the thumbnail).

I've tried this mapping:

mapping do
  indexes :id,                  :index    => :not_analyzed
  indexes :name                     
  indexes :url
  indexes :primary_image_original       
  indexes :primary_image_thumbnail
end

to reference these Artist methods:

    def primary_image_original  
        return images.where(:priority => 'primary').first.original
    end

    def primary_image_thumbnail
        return images.where(:priority => 'primary').first.thumbnail_150
    end

This just ignores the indexed methods. Based on other answers like Elasticsearch, Tire, and Nested queries / associations with ActiveRecord, I tried this:

mapping do
  indexes :id,                  :index    => :not_analyzed
  indexes :name 
  indexes :url
  indexes :images do
    indexes :original
    indexes :thumbnail_150
    indexes :priority
  end
end

def to_indexed_json
    to_json(include: { images: { only: [:original, :thumbnail_150, :priority] } } )
end

But this also doesn't return what I'm after. I've spent several hours googling and reading the elasticsearch and Tire documentation and haven't found a working example of this pattern to follow. Thanks for your ideas!

Bricklaying answered 28/11, 2012 at 7:37 Comment(5)
Note that the other indexed fields on the Artist model (name and url) are indexed and searchable as expected using the mappings above.Bricklaying
Could you try to use the :as option -- I don't have time to dig into this more at the moment, unfortunately.Ostrowski
Also, could you please provide a pastie/hastebin/etc with a link to the output of to_indexed_json?Ostrowski
Thanks for the quick reply, @karmi! (And thanks for the awesome gem!) I was able to figure out two ways to index a method of the Artist model. Please see my pastie here: pastie.org/5456743. However, the problem I'm seeing now is that both of these approaches increase indexing time by at least 60x. Without the methods, indexing a batch of 1000 records takes less than a second. With the methods, indexing a batch of 1000 records takes more than a minute. How can I speed up indexing in this case? I have several million records to index. Is there a better approach here? Thanks again.Bricklaying
Updated pastie with relevant methods: pastie.org/5456766Bricklaying
O
7

So, to include your solution to the indexing problem here.

Indexing associations

One way to index a method is to include it in the to_json call:

def to_indexed_json
  to_json( 
    :only   => [ :id, :name, :normalized_name, :url ],
    :methods   => [ :primary_image_original, :primary_image_thumbnail, :account_balance ]
  )
end

Another one, and more preferable, is to use the :as option in the mapping block:

mapping do
  indexes :id, :index    => :not_analyzed
  indexes :name             
  # ...

  # Relationships
  indexes :primary_image_original, :as => 'primary_image_original'
  indexes :account_balance,        :as => 'account_balance'
end

Fighting n+1 queries when importing

The problem with slow indexing is most probably due to n+1 queries in the database: for every artist you index, you issue a query for images (original and thumbnail). A much more performant way would be to join the associated records in one query; see Eager Loading Associations in Rails Guides.

The Tire Index#import method, and the import Rake task, allow you to pass parameters which are then sent to the paginate method down the wire.

So let's compare the naive approach:

bundle exec rake environment tire:import CLASS=Article FORCE=true
Article Load (7.6ms)  SELECT "articles".* FROM "articles" LIMIT 1000 OFFSET 0
Comment Load (0.2ms)  SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 1)
Comment Load (0.1ms)  SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 2)
...
Comment Load (0.3ms)  SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 100)

And when we pass the include fragment:

bundle exec rake environment tire:import PARAMS='{:include => ["comments"]}'  CLASS=Article FORCE=true 
Article Load (8.7ms)  SELECT "articles".* FROM "articles" LIMIT 1000 OFFSET 0
Comment Load (31.5ms) SELECT "comments".* FROM "comments" WHERE ("comments".article_id IN (1,2, ... ,100))

Much better :) Please try it out and let me know if it solves your issue.


You can also try it out in the Rails console: Article.import vs. Article.import(include: ['comments']). As a side note, this exact problem was the reason for supporting the params hash in the whole importing toolchain in Tire.

Ostrowski answered 12/12, 2012 at 20:20 Comment(1)
+1 for the answer. I've a CPU-intensive function that I want to get indexed by Elastic Search. But this function takes a few arguments and there's no way I can avoid those args. My question is is there any way I can index a function with arguments. Thanks a lot.Gaylegayleen

© 2022 - 2024 — McMap. All rights reserved.