Can a View of multiple tables be used for Full-Text-Search?
Asked Answered
A

1

6

I'm sorry to ask such a noob question, but the postgres documentation on views is sparse, and I had trouble finding a good answer.

I'm trying to implement Full-Text-Search on Postgres for three tables. Specifically, the user's search query would return matching 1) other usernames, 2) message, 3) topics.

I'm concerned that using a view for this might not scale well as it combines three tables into one. Is this a legitimate concern? If not, how else might I approach this?

Amaro answered 15/7, 2017 at 23:39 Comment(1)
You can do such a thing. Make sure that the needed columns (or expressions) on the original tables have the right indexes. Also, test the execution plans produced when you query the view, to make sure they show index usage. Otherwise, it will work, but really very slowly. Alternatively, you can have a materialized view, and have the indexes directly on it; and make sure to update it frequently enough.Smetana
S
14

What you request can be done. To have a practical example (with just two tables), you could have:

CREATE TABLE users
(
    user_id SERIAL PRIMARY KEY,
    username text
) ;

-- Index to find usernames
CREATE INDEX idx_users_username_full_text 
    ON users 
    USING GIN (to_tsvector('english', username)) ;        

CREATE TABLE topics
(
    topic_id SERIAL PRIMARY KEY,
    topic text
) ;

-- Index to find topics
CREATE INDEX idx_topics_topic_full_text 
    ON topics 
    USING GIN (to_tsvector('english', topic)) ;

See PostgreSQL docs. on Controlling Text Search for an explanation of to_tsvector.

... populate the tables

INSERT INTO users
   (username)
VALUES
   ('Alice Cooper'),
   ('Boo Geldorf'),
   ('Carol Burnet'),
   ('Daniel Dafoe') ;

INSERT INTO topics
   (topic)
VALUES
   ('Full text search'),
   ('Fear of void'),
   ('Alice in Wonderland essays') ;

... create a view that combines values from both tables

CREATE VIEW search_items AS
SELECT 
    text 'users' AS origin_table, user_id AS id, to_tsvector('english', username) AS searchable_element
FROM
    users
UNION ALL
SELECT 
    text 'topics' AS origin_table, topic_id AS id, to_tsvector('english', topic) AS searchable_element 
FROM
    topics ;

We search that view:

SELECT 
    *
FROM
    search_items
WHERE
    plainto_tsquery('english', 'alice') @@ searchable_element

... and get the following response (you should mostly ignore the searchable_element). You're mostly interested in the origin_table and id.

origin_table | id | searchable_element               
:----------- | -: | :--------------------------------
users        |  1 | 'alic':1 'cooper':2              
topics       |  3 | 'alic':1 'essay':4 'wonderland':3

See Parsing Queries for an explanation of plainto_tsquery function, and also @@ operator.


To make sure indexes are used:

EXPLAIN ANALYZE
SELECT 
    *
FROM
    search_items
WHERE
    plainto_tsquery('english', 'alice') @@ searchable_element
| QUERY PLAN                                                                                                                                 |
| :----------------------------------------------------------------------------------------------------------------------------------------- |
| Append  (cost=12.05..49.04 rows=12 width=68) (actual time=0.017..0.031 rows=2 loops=1)                                                     |
|   ->  Bitmap Heap Scan on users  (cost=12.05..24.52 rows=6 width=68) (actual time=0.017..0.018 rows=1 loops=1)                             |
|         Recheck Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, username))                                                 |
|         Heap Blocks: exact=1                                                                                                               |
|         ->  Bitmap Index Scan on idx_users_username_full_text  (cost=0.00..12.05 rows=6 width=0) (actual time=0.005..0.005 rows=1 loops=1) |
|               Index Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, username))                                             |
|   ->  Bitmap Heap Scan on topics  (cost=12.05..24.52 rows=6 width=68) (actual time=0.012..0.012 rows=1 loops=1)                            |
|         Recheck Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, topic))                                                    |
|         Heap Blocks: exact=1                                                                                                               |
|         ->  Bitmap Index Scan on idx_topics_topic_full_text  (cost=0.00..12.05 rows=6 width=0) (actual time=0.002..0.002 rows=1 loops=1)   |
|               Index Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, topic))                                                |
| Planning time: 0.098 ms                                                                                                                    |
| Execution time: 0.055 ms                                                                                                                   |

Indexes are really used (see Bitmap Index Scan on idx_topics_topic_full_text and Bitmap Index Scan on idx_users_username_full_text).

You can check everything at dbfiddle here


NOTE: 'english' is the text search configuration chosen to index and query. Choose the proper one for your case. You can create your own if the existing ones don't fill your needs.

Smetana answered 16/7, 2017 at 0:8 Comment(3)
Thanks, this is really helpful. Speed is a concern though. Have you any experience of the performance with a large number of records i.e. 2M+? In the case of materialized view, due to its limitations of updating is it safe to say that it's not practical to use for search?Amaro
It seems that the best solution is to create a table and triggers to its derivative content pulled from other tables.Amaro
I've got experience in the range of 100.000s. Search is fast. The problem normally is sorting (using ts_rank) if you have 100s or 1000s of results, because that function is compute-intensive. If you go the trigger way, I'd suggest you store only the to_tsvector result in that table, and index it. I don't think it will be much faster, unless you deal with more than 5 tables.Smetana

© 2022 - 2024 — McMap. All rights reserved.