Fuzzy String Matching with Rails (Tire) and ElasticSearch
Asked Answered
C

1

6

I have a Rails application that is now set up with ElasticSearch and the Tire gem to do searching on a model and I was wondering how I should set up my application to do fuzzy string matching on certain indexes in the model. I have my model set up to index on things like title, description, etc. but I want to do fuzzy string matching on some of those and I'm not sure where to do this at. I will include my code below if you would like to comment! Thanks!

In the controller:

    def search
      @resource = Resource.search(params[:q], :page => (params[:page] || 1),
                                 :per_page =>15, load: true )
   end

In the Model:

class Resource < ActiveRecord::Base
  include Tire::Model::Search
  include Tire::Model::Callbacks

  belongs_to :user
  has_many :resource_views, :class_name => 'UserResourceView'

  has_reputation :votes, source: :user, aggregated_by: :sum

  attr_accessible :title, :description, :link, :tag_list, :user_id, :youtubeID
  acts_as_taggable

  mapping do 
      indexes :id,  :index => :not_analyzed
      indexes :title, :analyzer => 'snowball', :boost => 40
      indexes :tag_list, :analyzer => 'snowball', :boost => 8
      indexes :description, :analyzer => 'snowball', :boost => 2
      indexes :user_id, :analyzer => 'snowball'
  end
end
Claypan answered 18/1, 2013 at 2:8 Comment(0)
W
2

Try creating custom analyzers to achieve other stemming features, etc. Check out my example (this example also uses Mongoid & attachments, don't look at it if you don't need it):

class Document
      include Mongoid::Document
      include Mongoid::Timestamps
      include Tire::Model::Search
      include Tire::Model::Callbacks

      field :filename, type: String
      field :md5, type: String
      field :tags, type: String
      field :size, type: String

      index({md5: 1}, {unique: true})
      validates_uniqueness_of :md5


      DEFAULT_PAGE_SIZE = 10

      settings :analysis => {
          :filter => {
              :ngram_filter => {
                  :type => "edgeNGram",
                  :min_gram => 2,
                  :max_gram => 12
              },
              :custom_word_delimiter => {
                  :type => "word_delimiter",
                  :preserve_original => "true",
                  :catenate_all => "true",
              }
          }, :analyzer => {
              :index_ngram_analyzer => {
                  :type => "custom",
                  :tokenizer => "standard",
                  :filter => ["lowercase", "ngram_filter", "asciifolding", "custom_word_delimiter"]
              },
              :search_ngram_analyzer => {
                  :type => "custom",
                  :tokenizer => "standard",
                  :filter => ["standard", "lowercase", "ngram_filter", "custom_word_delimiter"]
              },
              :suggestions => {
                  :tokenizer => "standard",
                  :filter => ["suggestions_shingle"]
              }
          }
      } do
        mapping {
          indexes :id, index: :not_analyzed
          indexes :filename, :type => 'string', :store => 'yes', :boost => 100, :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
          indexes :tags, :type => 'string', :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
          indexes :attachment, :type => 'attachment',
                  :fields => {
                      :content_type => {:store => 'yes'},
                      :author => {:store => 'yes', :analyzer => 'keyword'},
                      :title => {:store => 'yes'},
                      :attachment => {:term_vector => 'with_positions_offsets', :boost => 90, :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer},
                      :date => {:store => 'yes'}
                  }
        }
      end


      def to_indexed_json
        self.to_json(:methods => [:attachment])
      end

      def attachment        
          path_to_file = "#{Rails.application.config.document_library}#{path}/#{filename}"
          Base64.encode64(open(path_to_file) { |file| file.read })
      end

      def self.search(query, options)
        tire.search do
          query { string "#{query}", :default_operator => :AND, :default_field => 'attachment', :fields => ['filename', 'attachment', 'tags'] }
          highlight :attachment
          page = (options[:page] || 1).to_i
          search_size = options[:per_page] || DEFAULT_PAGE_SIZE
          from (page -1) * search_size
          size search_size
          sort { by :_score, :desc }
          if (options[:facet])
            filter :terms, :tags => [options[:facet]]
            facet 'global-tags', :global => true do
              terms :tags
            end
            facet 'current-tags' do
              terms :tags
            end
          end
        end
      end
    end

Hope it helps,

Weimaraner answered 25/1, 2013 at 20:17 Comment(4)
helpful, but elasticsearch ended up being far too cumbersome so we ended up switching to postgresql. thanks though!Claypan
very helpful...with a little bit of patience, ur example worked like a charm :)Daven
What effect does the :store => 'yes' parameter have?Ecthyma
@Ecthyma from the docs: "Set to true to actually store the field in the index, false to not store it. Defaults to false (note, the JSON document itself is stored, and it can be retrieved from it)." - elasticsearch.org/guide/en/elasticsearch/reference/current/… Although, I think Tire stores values implicitly, so this is probably unnecessary.Schafer

© 2022 - 2024 — McMap. All rights reserved.