Proper full text index Rails/PostgreSQL/pg_search
Asked Answered
S

1

10

I am testing performance for PostgreSQL full text search (using pg_search gem) and solr (sunspot_solr gem).

For 4 million records I am getting 13456 ms for Tsearch and 800 ms with SOLR (that is SOLR query + DB retrival). It is obvious that I need index but I am not sure how to create one for full text search. I investigated and found that for full text search I should use GIN index.

execute "CREATE INDEX products_gin_title ON products USING GIN(to_tsvector('english', title))"

But I am searching via two more columns and I need multi-value index and I am not sure how to implement it? I am not very familiar with DB part. My search code looks like:

@results = Product.search_title(params[:search_term]).where("platform_id=? AND product_type=?", params[:platform_id], params[:type_id]).limit(10).all

How do I create proper query for this type of situations?

Here is SQL output from rails for search term car.

Product Load (12494.0ms)
SELECT 
    "products".*, 
    ( ts_rank((to_tsvector('simple', coalesce("products"."title"::text, ''))), (to_ tsquery('simple', ''' ' || 'car' || ' ''')), 2) ) AS pg_search_rank 
FROM "products" 
WHERE (((to_tsvector('simple', coalesce("products"."tit le"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'car' || ' ''')))) 
    AND (platform_id='26' AND product_type='2') 
ORDER BY pg_search_rank DESC, "products"."id" ASC 
LIMIT 10

EDIT:

I am using PostgreSQL 8.4.11, EXPLAIN ANALYZE output is following.

Limit  (cost=108126.34..108126.36 rows=10 width=3824) (actual time=12228.736..12228.738 rows=10 loops=1)   
->  Sort (cost=108126.34..108163.84 rows=14999 width=3824) (actual time=12228.733..12228.734 rows=10 loops=1)
    Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)), '''car'''::tsquery, 2)), id
    Sort Method:  top-N heapsort  Memory: 18kB
    ->  Seq Scan on products  (cost=0.00..107802.22 rows=14999 width=3824) (actual time=7.532..12224.585 rows=977 loops=1)
        Filter: ((platform_id = 26) AND (product_type = 2) AND (to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)) @@ '''car'''::tsquery)) 

Total runtime: 12228.813 ms
Sixfooter answered 4/6, 2012 at 11:14 Comment(1)
Please post the output of running EXPLAIN ANALYZE on the posted query, plus your Pg version, any postgresql.conf params you've changed, etc.Chronometry
M
10

This expression:

to_tsvector('simple', (COALESCE(title::TEXT), ''))

is not sargable against your index.

You should declare the index on the exactly that expression which is used in the query:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', COALESCE(title::TEXT,'')))

(or make ruby generate the expression which is used in the index).

If you want multiple columns to be indexed, just concatenate them:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', title || ' ' || product_type || ' ' || platform_id))

but again, Ruby should be filtering on exactly same expression for the index to be of use.

Mingrelian answered 4/6, 2012 at 11:55 Comment(4)
Thank you fist single column index worked now query time is 80 ms, nice! Question for you in addition to this is should it be multi column index to further increase speed? When you say concatenate them, does that mean I should do full text search by concatenating FK ids to my search term?Sixfooter
In this example I am searching for title via full text, and also narrowing results with product_type, platform_id (mentioned FKs). I am under impression that if you want to speed up specific query you need to add index for ALL columns used in that query.Sixfooter
@Dolphin: you cannot create composite GIN indexes. Well, you can but it's a very complex task which requires creating custom operator classes and other dark magic. This also requires SUPER privilege (which means you cannot do this on hosted databases etc.)Mingrelian
This cut my search time down from 3000ms to 60ms. Thank you.Haemo

© 2022 - 2024 — McMap. All rights reserved.