I'm running some complex FTS queries in Postgres 13.4 and discovered some behavior in ts_headline
that's unexpected, at least to me, and wasn't sure if I've come across a feature or a bug. ;)
Initial sanity check:
SELECT plainto_tsquery('english', 'red dog') @@ to_tsvector('The quick brown fox jumped over the lazy dog.');
-- false
No surprises: the tsquery
evaluates to 'red' & 'dog'
, the document does not contain 'red'
, no match. But when I try to get headlines:
SELECT ts_headline('The quick brown fox jumped over the lazy dog.', plainto_tsquery('english', 'red dog'));
-- The quick brown fox jumped over the lazy <b>dog</b>.
The same happens with the FOLLOWED_BY operator (<->
); you can replace plainto_tsquery
with phraseto_tsquery
(or construct your own tsquery
literals). It still highlights fragments that aren't actually matches.
The problem is not (or at the very least, not entirely) a result of trying to call ts_headline
without having a real match. My original situation was actually more like:
SELECT ts_headline('I want a red dog, but not a black dog. No red cats, either.', phraseto_tsquery('english', 'red dog'));
-- I want a <b>red</b> <b>dog</b>, but not a black <b>dog</b>. No <b>red</b> cats, either.
In that case, given that the tsquery
evaluates to 'red' <-> 'dog'
(i.e., "red" immediately followed by "dog"), I'm surprised by the last two highlights.
The documentation for ts_headline
says:
Displays, in an abbreviated form, the match(es) for the query in the document,
which leads me to believe this is a bug, but the longer blurb on highlighting results only says the function
returns an excerpt from the document in which terms from the query are highlighted
and indeed, terms from the query are highlighted...
I've tried messing with the options argument to ts_headline
, and nothing has changed this behavior.
So... Am I calling it wrong, is it wrong, or do I just have wrong expectations for its behavior?
(Seems tangential to this older question, but it seems to be describing a different bug. And I can't tell what exactly is or isn't happening or supposed to happen in this scenario, so who knows if it's related or not.)
ts_headline('english', 'The quick brown fox jumped over the lazy dog.')
? – Poliomyelitis