PostgreSQL ts_headline not matching ts_query correctly
Asked Answered
A

1

2

When running the following query:

select ts_headline(
    $$Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean vitae mauris ac metus pellentesque malesuada. In hac habitasse platea dictumst. Quisque malesuada purus sit amet facilisis aliquam. Pellentesque pharetra erat urna, sit amet tincidunt lectus malesuada vitae. Proin eros metus, fringilla non tortor quis, cursus vulputate felis. Vestibulum laoreet vel urna at blandit. Donec eu enim rutrum, lobortis orci ut, elementum felis. Nunc iaculis ex quis dolor commodo auctor. Suspendisse eleifend tellus nulla, et semper enim varius ut. Donec eu ante pharetra, convallis dui in, vehicula nibh. Nullam quis arcu mattis, suscipit sapien id, ullamcorper urna.$$,
    to_tsquery('Lorem') && phraseto_tsquery('ullamcorper urna'),
    'StartSel=#$#, StopSel=#$#, FragmentDelimiter=$#$, MaxFragments=100, MaxWords=40000, MinWords=5'
);

Instead of returning a full fragment, I only receive #$#Lorem#$# ipsum dolor sit amet as the result. From reading the spec:

not all query words are found in the document, then a single fragment of the first MinWords in the document will be displayed.

However, running the following confirms that tsquery is valid:

select
    to_tsvector($$Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean vitae mauris ac metus pellentesque malesuada. In hac habitasse platea dictumst. Quisque malesuada purus sit amet facilisis aliquam. Pellentesque pharetra erat urna, sit amet tincidunt lectus malesuada vitae. Proin eros metus, fringilla non tortor quis, cursus vulputate felis. Vestibulum laoreet vel urna at blandit. Donec eu enim rutrum, lobortis orci ut, elementum felis. Nunc iaculis ex quis dolor commodo auctor. Suspendisse eleifend tellus nulla, et semper enim varius ut. Donec eu ante pharetra, convallis dui in, vehicula nibh. Nullam quis arcu mattis, suscipit sapien id, ullamcorper urna.$$)
 @@ (to_tsquery('Lorem') && phraseto_tsquery('ullamcorper urna'));

Why is ts_headline not able to find the fragment?

Alack answered 29/11, 2019 at 10:22 Comment(0)
D
1

Looks like a bug introduced when phraseto_tsquery was first added. I would guess that it finds the first 'urna', finds it is not adjacent to 'ullamcorper', and then forgets to continue searching for more 'urna'.

Please submit a bug report. You might want to make your example several fold smaller first.

This seems to be sufficient:

select ts_headline(
    $$Lorem ipsum urna.  Nullam  nullam ullamcorper urna.$$,
    to_tsquery('Lorem') && phraseto_tsquery('ullamcorper urna'),
    'StartSel=#$#, StopSel=#$#, FragmentDelimiter=$#$, MaxFragments=100, MaxWords=40000, MinWords=5'
);
Deipnosophist answered 29/11, 2019 at 20:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.