PostgreSQL: Find sentences closest to a given sentence
Asked Answered
R

2

6

I have a table of images with sentence captions. Given a new sentence I want to find the images that best match it based on how close the new sentence is to the stored old sentences.

I know that I can use the @@ operator with a to_tsquery but tsquery accepts specific words as queries.

One problem is I don't know how to convert the given sentence into a meaningful query. The sentence may have punctuation and numbers.

However, I also feel that some kind of cosine similarity thing is what I need but I don't know how to get that out of PostgresQL. I am using the latest GA version and am happy to use the development version if that would solve my problem.

Ryle answered 5/1, 2016 at 3:29 Comment(0)
F
6

Full Text Search (FTS)

You could use plainto_tsquery() to (per documentation) ...

produce tsquery ignoring punctuation

SELECT plainto_tsquery('english', 'Sentence: with irrelevant words (and punctuation) in it.')

 plainto_tsquery
------------------
 'sentenc' & 'irrelev' & 'word' & 'punctuat'

Use it like:

SELECT *
FROM   tbl
WHERE  to_tsvector('english', sentence) @@ plainto_tsquery('english', 'My new sentence');

But that is still rather strict and only provides very limited tolerance for similarity.

Trigram similarity

Might be better suited to search for similarity, even overcome typos to some degree.

Install the additional module pg_trgm, create a GiST index and use the similarity operator % in a nearest neighbour search:

Basically, with a trigram GiST index on sentence:

-- SELECT set_limit(0.3);  -- adjust tolerance if needed

SELECT *
FROM   tbl
WHERE  sentence % 'My new sentence'
ORDER  BY sentence <-> 'My new sentence'
LIMIT  10;

More:

Combine both

You can even combine FTS and trigram similarity:

Frustum answered 5/1, 2016 at 5:4 Comment(0)
R
1

it's a pretty late answer, but I'm adding in case anyone encounters. If you add ": *" to the end of the words, it will bring up similar ones. Sample: JS autocomlete -> Codeigniter:

barcode = $ this-> input-> get ("term"). ":*";

Query: $ query = 'select * from tablaneme where xx @@? LIMIT 15 '; $ barcodequery = $ this-> db-> query ($ query, array (explode ("", $ barcode)))) -> result_array ();

Rockery answered 12/6, 2019 at 23:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.