Escaping special characters in to_tsquery
Asked Answered
E

3

26

How do you espace special characters in string passed to to_tsquery? For instance, this kind of query:

select to_tsquery('AT&T');

Produces:

NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored

 to_tsquery 
------------

(1 row)

Edit: I also noticed that there is the same issue in to_tsvector.

Electrophilic answered 31/12, 2012 at 16:29 Comment(0)
J
11

A simple solution is to create the tsquery as follows:

select $$'AT&T'$$::tsquery;

You can make more complex queries:

select $$'AT&T' & Phone | '|Bang!'$$::tsquery;

See the text search docs for more.

Jayme answered 27/2, 2016 at 11:32 Comment(1)
Caution: this won't apply the various dictionary transformations performed by to_tsquery, such as stemming, synonym grouping and stopword removal. This may mean that some queries will never find a match.Herpes
C
12

I found this comment very useful that uses the plainto_tsquery('AT&T) function https://mcmap.net/q/537198/-psqlexception-error-syntax-error-in-tsquery

Catboat answered 23/1, 2017 at 11:10 Comment(3)
Is there anyway to support partial matching e.g. adding :* onto the end with this?Impunity
@ColinD I'm sorry, but have you found a way to do this?Beckmann
I left the project that I was using for this but I don't remember achieving the solution, my final query was like this query = "SELECT o.ortholog_id, g.gene_id, g.species_id, g.description, g.symbol FROM search_index s JOIN orthologs o on o.ortholog_id = s.ortholog_id JOIN genes g on o.gene_id = g.gene_id WHERE s.document @@ to_tsquery('english', ?search) ORDER BY ts_rank(s.document, to_tsquery('english', ?search)) DESC, o.ortholog_id LIMIT 300"; and if I remember correctly the user could manually add * to the queryImpunity
J
11

A simple solution is to create the tsquery as follows:

select $$'AT&T'$$::tsquery;

You can make more complex queries:

select $$'AT&T' & Phone | '|Bang!'$$::tsquery;

See the text search docs for more.

Jayme answered 27/2, 2016 at 11:32 Comment(1)
Caution: this won't apply the various dictionary transformations performed by to_tsquery, such as stemming, synonym grouping and stopword removal. This may mean that some queries will never find a match.Herpes
P
3

If you want 'AT&T' to be treated as a search word, you're going to need some customised components, because the default parser splits it as two words:

steve@steve@[local] =# select * from ts_parse('default', 'AT&T');
 tokid | token 
-------+-------
     1 | AT
    12 | &
     1 | T
(3 rows)
steve@steve@[local] =# select * from ts_debug('simple', 'AT&T');
   alias   |   description   | token | dictionaries | dictionary | lexemes 
-----------+-----------------+-------+--------------+------------+---------
 asciiword | Word, all ASCII | AT    | {simple}     | simple     | {at}
 blank     | Space symbols   | &     | {}           |            | 
 asciiword | Word, all ASCII | T     | {simple}     | simple     | {t}
(3 rows)

As you can see from the documentation for CREATE TEXT PARSER this is not very trivial, as the parser appears to need to be a C function.

You might find this post of someone getting "underscore_word" to be recognised as a single token useful: http://postgresql.1045698.n5.nabble.com/Configuring-Text-Search-parser-td2846645.html

Peg answered 31/12, 2012 at 16:51 Comment(1)
second link is deadHypervitaminosis

© 2022 - 2024 — McMap. All rights reserved.