Why does proximity affect ts_rank on multi-term queries?
Asked Answered
A

0

6

When I use ts_rank with a ts_query that contains multiple terms with an & operator proximity of terms affects the ranking and creates results I wasn't expecting. An example:

select ts_rank(to_tsvector('why in the world is this not working?'), plainto_tsquery('world working'));
RESULT: 0.095243

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world working'));
RESULT: 0.0397712

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working? I just do not get it'), plainto_tsquery('world working'));
RESULT: 0.0397712

In the documentation ts_rank is describe as simply measuring frequency of matching.

ts_rank([ weights float4[], ] vector tsvector, query tsquery [, normalization integer ]) returns float4 Ranks vectors based on the frequency of their matching lexemes.

However the example above seems like it is measuring frequency and, in the case of multi term queries, proximity.

This creates unexpected results for me in the example below:

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world'));
RESULT: 0.0607927

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world working'));
RESULT: 0.0397712

I would expect the document to rank more highly in the second query because it matches multiple terms in the query, but instead it ranks lower.

Is there a way to prevent this behavior? Is there something I misunderstanding about ts_rank or how to use it?

Aquatint answered 10/9, 2017 at 4:22 Comment(1)
Why in the world is there no answer to this question?Clovah

© 2022 - 2024 — McMap. All rights reserved.