suggest like google with postgresql trigrams and full text search
Asked Answered
C

2

9

I want to do a text search like google suggestions.

I'm using PostgreSQL because of the magical Postgis.

I was thinking on using FTS, but I saw that it could not search partial words, so I found this question, and saw how trigrams works.

The main problem is that the search engine I'm working on is for spanish language. FTS worked great with stemming and dictionaries (synonyms, misspells), UTF and so on. Trigrams worked great for partial words, but they only work for ASCII, and (obviously) they don't use things like dictionaries.

I was thinking if is there any way in which the best things from both could be used.

Is it possible make Full Text Search and Trigrams to work together in PGSQL?

Cassino answered 16/5, 2012 at 15:44 Comment(0)
F
9

You can do this in Postgres, and don't need Lucene.

You can quote phrases in tsquery or tsvector like the below. You can add a :* after a tsquery term to do a prefix search:

select
'''new york city'''::tsvector   @@ '''new yo'':*'::tsquery, --true
'''new york times'''::tsvector  @@ '''new yo'':*'::tsquery, --true
'''new york'''::tsvector        @@ '''new yo'':*'::tsquery, --true
'''new'''::tsvector             @@ '''new yo'':*'::tsquery, --false
'new'::tsvector                 @@ '''new yo'':*'::tsquery, --false
'new york'::tsvector            @@ '''new yo'':*'::tsquery  --false

The main problem is that to_tsvector() and [plain]to_tsquery() will strip your quotes. You can write your own versions that don't do this (it's not that hard), or do some post-processing after them to build your term n-grams.

The extra single quotes above are just escapes. select $$ i heart 'new york city' $$::tsvector; is equivalent.

Falsity answered 29/7, 2014 at 6:5 Comment(2)
wow! can you please add a reference to documentation regarding this triple quote?Cassino
the triple quote is actually normal SQL, i'm just escaping a single quote with another. The docs mention it here: postgresql.org/docs/9.3/static/textsearch-controls.htmlFalsity
A
-1

I would recommend having a look at Lucene. It can be integrated natively in Java, easily in .NET or using SOLR and web services in php.

It has great capabilities for free text searches, term ranking out of the box, support for different languages using different Analysers (link for the spanish one).

Last but not least, it's also extremely fast (for big volumes, say 4Gb index ~ 5 000 000 rows in a DB, it's much faster than a Postgres database).

Assortment answered 17/8, 2012 at 20:28 Comment(2)
I will go with a similar solution, using elasticsearch, but I cannot accept, because it doesn't answers the question (in case someone arrives here looking for the answer). Thanks!Cassino
the poster specifically asked for a solution with postgresCotyledon

© 2022 - 2024 — McMap. All rights reserved.