Prefix/wildcard searches with 'websearch_to_tsquery' in PostgreSQL Full Text Search?
Asked Answered
M

6

11

I'm currently using the websearch_to_tsquery function for full text search in PostgreSQL. It all works well except for the fact that I no longer seem to be able to do partial matches.

SELECT ts_headline('english', q.\"Content\", websearch_to_tsquery('english', {request.Text}), 'MaxFragments=3,MaxWords=25,MinWords=2') Highlight, *
FROM (
    SELECT ts_rank_cd(f.\"SearchVector\", websearch_to_tsquery('english', {request.Text})) AS Rank, *
    FROM public.\"FileExtracts\" f, websearch_to_tsquery('english', {request.Text}) as tsq
    WHERE f.\"SearchVector\" @@ tsq
    ORDER BY rank DESC
) q

Searches for customer work but cust* and cust:* do not.

I've had a look through the documentation and a number of articles but I can't find a lot of info on it. I haven't worked with it before so hopefully it's just something simple that I'm doing wrong?

Merciless answered 5/10, 2020 at 1:13 Comment(3)
I'm looking for the same solution. Any new result here?Lilia
@Lilia I ended up just using Elasticsearch but would be keen to hear if there's a clean way to do this with PostgreSQL.Merciless
@ChrisOwens I'm pretty late to the party but you can simply use both websearch_to_tsqueryand to_tsquery together. See my answer for a quick example.Fishwife
C
5

You can't do this with websearch_to_tsquery but you can do it with to_tsquery (because ts_query allows to add a :* wildcard) and add the websearch syntax yourself in in your backend.

For example in a node.js environment you could do smth. like this:

let trimmedSearch = req.query.search.trim()
let searchArray = trimmedSearch.split(/\s+/) //split on every whitespace and remove whitespace
let searchWithStar = searchArray.join(' & ' ) + ':*'  //join word back together adds AND sign in between an star on last word
let escapedSearch = yourEscapeFunction(searchWithStar)

and than use it in your SQL

search_column @@ to_tsquery('english', ${escapedSearch})
Caviar answered 9/4, 2021 at 9:42 Comment(1)
Great answer ! I found that you could also simply use websearch_to_tsquery and to_tsquery together. See my answer for a quick example.Fishwife
F
5

To add something on top of the other good answers here, you can also compose your query with both websearch_to_tsquery and to_tsquery to have everything from both worlds:

select * from your_table where ts_vector_col @@ to_tsquery('simple', websearch_to_tsquery('simple', 'partial query')::text || ':*')
Fishwife answered 30/10, 2022 at 18:27 Comment(1)
In this example if 'simple' is 'english' (or any other language) basic words are filtered like "in", "the", etc. This would result in the following error: Code: `42601`. Message: `db error: ERROR: syntax error in tsquery: ":*"Bodycheck
T
2

Another solution I have come up with is to do the text transform as part of the query so building the tsquery looks like this

to_tsquery(concat(regexp_replace(trim('  all the search terms here  '), '\W+', ':* & '), ':*'));
  • (trim) Removes leading/trailing whitespace
  • (regexp_replace) Splits the search string on non word chars and adds trailing wildcards to each term, then ANDs the terms (:* & )
  • (concat) Adds a trailing wildcard to the final term
  • (to_tsquery) Converts to a ts_query

You can test the string manipulation by running

SELECT concat(regexp_replace(trim('  all the search terms here  '), '\W+', ':* & ', 'gm'), ':*')

the result should be

all:* & the:* & search:* & terms:* & here:*

So you have multi word partial matches e.g. searching spi ma would return results matching spider man

Timpani answered 22/6, 2022 at 11:9 Comment(0)
P
1

You need to write the tsquery directly if you want to use partial matching. plainto_tsquery doesn't pass through partial match notation either, so what were you doing before you switched to websearch_to_tsquery?

Anything that applies a stemmer is going to have hard time handling partial match. What is it supposed to do, take off the notation, stem the part, then add it back on again? Not do stemming on the whole string? Not do stemming on just the token containing the partial match indicator? And how would it even know partial match was intended, rather than just being another piece of punctuation?

Polyhydroxy answered 5/10, 2020 at 2:49 Comment(0)
K
1

The below function enables to use websearch_to_tsquery with supported syntax: quoted text, OR, dash..
Along with appending the text query words with wildcard search possibility.

This can be used in case you always expect users to input a start of a words in the search query.

SELECT websearch_to_wildcard_tsquery('a or -b'); prepares query to match all words starting a OR not starting b.

CREATE OR REPLACE FUNCTION websearch_to_wildcard_tsquery(text_query text)
RETURNS tsquery AS $$
    DECLARE
        query_splits text[];
        split text;
        new_text_query text := '';
    BEGIN
        SELECT regexp_split_to_array(d::text, '\s* \s*') INTO query_splits FROM websearch_to_tsquery('simple', text_query) d;
        FOREACH split IN ARRAY query_splits LOOP
            CASE WHEN split = '|' OR split = '&' OR split = '!' OR split = '<->' OR split = '!('
                THEN new_text_query := new_text_query || split || ' ';
            ELSE new_text_query := new_text_query || split || ':* ';
            END CASE;
        END LOOP;
        RETURN to_tsquery('simple', new_text_query);
    END;
$$ LANGUAGE plpgsql;

If it may be useful to someone please use it freely.

Kevinkevina answered 29/5, 2023 at 9:43 Comment(0)
M
0

After many attempts I managed to find the simplest solution yet. It has a similarity with James Mudd's answer, but it's cleaner, supports double quoted search terms and makes syntax errors impossible. Enjoy:

SELECT to_tsvector('simple', 'a good option') @@
  -- The query «"goo opt"» is converted to: «'goo' <-> 'opt' »
  -- and finally becomes: «'goo':*<-> 'opt':*»
  replace(
    websearch_to_tsquery('simple', '"goo opt"')::text || ' ',
    ''' ',
    ''':*'
  )::tsquery
Mistrust answered 8/6 at 10:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.