pg_search: how to prioritize exact word matches?
Asked Answered
P

1

6

Probem: When searching texts with multiple occurrences of words similar to the search query they get higher rank than texts with one exact match.

Example: Say search query is "productivity", then "production of organic products" gets higher rank than "labour productivity" just because it contains two similar words, though no exact word matches.

Question: What's the best way to prioritize records that have exact matches?

Here's my pg_scope:

  pg_search_scope :search,
                  :against => {
                    :title => 'A',
                    :search_string => 'B'
                  },
                  :using => {
                    :tsearch => {
                      :dictionary => 'english',
                      :any_word => 'true'
                    }
                  }

Thx.

Pe answered 1/6, 2014 at 13:6 Comment(3)
Did you ever figure this out?Landing
@Landing I didn't find a proper solution. I ended up applying an exact match order in my query .select(Dataset.arel_table[:title].matches("%#{q}%").to_sql << 'as exact_match').order('exact_match desc').search(q) But if I was doing it again I would not use pg_search and use elastic search instead elastic.coPe
You could check out fuzzystrmatch and use levenshtein to give check for exactness postgresql.org/docs/9.1/fuzzystrmatch.htmlPennypennyaliner
C
0

If you can manipulate the search query, have a look around at cover density ranking (ts_rank_cd() instead of ts_rank()) and its normalization parameter.

SELECT v, ts_rank(to_tsvector('english', v), to_tsquery('english', 'productivity')) rank,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity')) rankcd,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity'), 4) rankcd4,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity'), 6) rankcd6
FROM (
  VALUES ('production of organic products'::TEXT),
         ('labour productivity'),
         ('labour productivity with more unrelated words'),
         ('labour productivity with more unrelated words and again production'),
         ('production of productivity'),
         ('product production')
) d(v)

SQLFiddle

However, I'm not sure how to set up pg_search_scope with custom normalization.

Alternatively, you might be interested in trigram searches as well, they offer more exact (character by character) results.

Carrew answered 27/6, 2014 at 10:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.