PostgreSQL - making ts_rank take the ts_vector position as-is or defining a custom ts_rank function
Asked Answered
A

2

6

I'm performing weighted search on a series of items in an e-commerce platform. The problem I have is ts_rank is giving me the exact same value for different combinations of words, even if the ts_vector gives different positions for each set of words.

Let me illustrate this with an example:

If I give ts_vector the word camas, it gives me the following:

'cam':1

If I give ts_vector the word sofas camas, it gives me the following:

'cam':2 'sof':1

So camas is getting different positions depending on the words combination.

When I execute the following statement:

select ts_rank(to_tsvector('camas'),to_tsquery('spanish','cama'));

PostgreSQL gives me 0.0607927 as the ts_rank computed value, whereas the computed value for the following statement:

select ts_rank(to_tsvector('sofas camas'),to_tsquery('spanish','cama'));

is the same value: 0.0607927.

How can this be?

The question I have in mind is the following: is there a way for ts_rank to consider the position for the words contained in the ts_vector structure as-is or is there a way to define a custom ts_rank function for me to take the position for the words as explained?

Any help would be greatly appreciated.

Appointive answered 20/4, 2016 at 9:19 Comment(0)
S
8

As the documentation sais about functions ts_rank and ts_rank_cd:

they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur

That is these functions ignore other words in calculation. For example, you can get different results for these queries:

postgres=# select ts_rank(to_tsvector('spanish', 'famoso sofas camas'),to_tsquery('spanish','famoso & cama'));
  ts_rank  
-----------
 0.0985009
(1 row)

postgres=# select ts_rank(to_tsvector('spanish', 'famoso camas'),to_tsquery('spanish','famoso & cama'));
  ts_rank  
-----------
 0.0991032
(1 row)

postgres=# select ts_rank(to_tsvector('spanish', 'sofas camas camas'),to_tsquery('spanish','cama'));
  ts_rank  
-----------
 0.0759909
(1 row)

Also the documentation sais:

Different applications might require additional information for ranking, e.g., document modification time. The built-in ranking functions are only examples. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs.

You can get PostgreSQL code from GitHub. Needed function is ts_rank_tt.

Spank answered 25/4, 2016 at 9:44 Comment(0)
S
2

You can also change the normalization options to take it into account the document length, which is ignored by default.

For example, if you add 1 as the third parameter, it divides the rank by 1 + the logarithm of the document length. With your example:

postgres=# select ts_rank(to_tsvector('spanish', 'camas'),to_tsquery('spanish','camas'), 1); 
  ts_rank  
-----------
 0.0607927
(1 row)

postgres=# select ts_rank(to_tsvector('spanish', 'sofas camas'),to_tsquery('spanish','camas'), 1); 
  ts_rank  
-----------
 0.0383559
(1 row)

Documentation: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING

Symbolic answered 3/12, 2018 at 11:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.