How to get ts_headline to respect phraseto_tsquery
Asked Answered
F

2

6

I have a query that uses phrase search to match whole phrases.

SELECT ts_headline(
  'simple',
  'This is my test text. My test text has many words. Well, not THAT many words.',
  phraseto_tsquery('simple', 'text has many words')
);

Which results in:

This is my test <b>text</b>. My test <b>text</b> <b>has</b> <b>many</b> <b>words</b>. Well, not THAT <b>many</b> <b>words</b>.

But I would have expected this:

This is my test text. My test <b>text</b> <b>has</b> <b>many</b> <b>words</b>. Well, not THAT many words.

Or ideally even this:

This is my test text. My test <b>text has many words</b>. Well, not THAT many words.

Sidenote:

phraseto_tsquery('simple', 'text has many words')

is equivalent to

to_tsquery('simple', 'text <-> has <-> many <-> words')

I'm not sure if I'm doing something wrong, or if ts_headline simply does not support this kind of highlighting.

Figone answered 26/3, 2018 at 16:9 Comment(0)
L
7

phraseto_tsquery('simple', 'text has many words') generates correct query but it seems the problem is in ts_headline function. Seems like an already reported BUG #155172.

Lobule answered 23/9, 2020 at 4:3 Comment(0)
S
0

I am writing an extension that improves the ts_headline functionality to correctly highlight matching phrases with single tag, while not highlighting partial matches. The extension in available at https://github.com/thevermeer/pg_ts_semantic_headline and is intended to replace ts_headline directly.

Usage:

SELECT ts_semantic_headline(
  'simple',
  'This is my test text. My test text has many words. Well, not THAT many words.',
  phraseto_tsquery('simple', 'text has many words')
);

which produces:

ts_semantic_headline
This is my test text. My test <b>text has many words.</b> Well, not THAT many words.

The ts_semantic_headline solution is uses ts_headline under the hood to produce content fragments, and then uses text parsing and customized TSVectors, along with the included ts_fast_headline function to perform multi-word highlighting with minimal (5-10%) performance cost on top of ts_headline.

If performance is of any concern, the ts_fast_headline function can also use 2 pre-processed columns (TSPVector + TEXT[]), and deliver highlighted content 5X-10X faster than ts_headline.

Sortie answered 18/3 at 19:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.