Strange behavior with tsquery in PostgreSQL with prefix-lexemes
Asked Answered
P

2

7

When I use 'a:*' (also 'i:*', 's:*', 't:*')

SELECT id FROM mv_fulltextsearch1 WHERE to_tsvector(text) @@ to_tsquery('a:*') LIMIT 50;

Takes forever and prints the following PostgreSQL output a lot

NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored

But when I use 'b:*' (same with any other single letter in front of ':*')

SELECT id FROM mv_fulltextsearch1 WHERE to_tsvector(text) @@ to_tsquery('b:*') LIMIT 50;

everything is OK

Are a, i, s and t some kind of special characters? How can I escape them / fix the strange behavior?

Primine answered 30/1, 2018 at 14:53 Comment(0)
M
7

use to_tsvector('simple', text) and to_tsquery('simple', 'a:*')

The reason is that the 'english' regconfig removes stop words and "a" is considered a stop word

However, the 'simple' regconfig does not remove stop words

Marnamarne answered 30/1, 2018 at 19:8 Comment(0)
M
0

https://www.postgresql.org/docs/current/static/textsearch-controls.html#textsearch-parsing-queries

Also, * can be attached to a lexeme to specify prefix matching:

https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES

while basic tsquery input takes the tokens at face value, to_tsquery normalizes each token into a lexeme using the specified or default configuration, and discards any tokens that are stop words according to the configuration.

which leads me to a conclusion that your to_tsquery throws a and I as stop words, staying with NO TEXT to query... (see the example with the rat and cat in docs above)

(((Please dont ask what stop word is t)))

eg, if you (no to_tsquery and thus stop words not thrown away)

with c(t) as (values('a an also at bond'),('but by illegal'),('I in it aligator'))
select t,to_tsvector(t) @@ ('a:*')::tsquery from c;

         t         | ?column?
-------------------+----------
 a an also at bond | t
 but by illegal    | f
 I in it aligator  | t
(3 rows)

it will work...

for reference on stop words:

-bash-4.2$ grep "^t$" /usr/share/pgsql93/tsearch_data/english.stop
t

t is the one... but my modest knowledge of English lacks understanding why

Mathamathe answered 30/1, 2018 at 17:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.