Optimal Postgres text index for LIKE query?
Asked Answered
C

2

9

Using Postgres 9.5, I have a table addresses.

CREATE TABLE addresses (
    id        integer PRIMARY KEY,
    address   text
);

In that table I have 7.5 million rows. Example:

1, "1600 Pennsylvania Avenue NW, Washington, DC, 20500"

I'm using this table for an autosuggest search in my application so I need to use this type of query:

SELECT * FROM addresses WHERE address LIKE '123 Main St%';

I created this index:

CREATE INDEX address_idx ON addresses (address);

But the problem is it's taking around 1 sec which is too slow.

Here's the query plan:

EXPLAIN SELECT * FROM addresses WHERE address LIKE '123 Main St%';
----
Seq Scan on addresses  (cost=0.00..161309.76 rows=740 width=41)
  Filter: (address ~~ '123 Main St%'::text)

I tried creating a few types of gin indexes but they either had no effect or made the query slower. I'm not sure if I was using them correctly though.

Any ideas on how to create an index that's optimized for this kind of query?


EDIT

The best solution found so far is to use a text range scan:

SELECT *
FROM addresses
WHERE address >= '123 Main St' AND
      address <= concat('123 Main St', 'z');
Cadre answered 9/3, 2016 at 0:54 Comment(4)
Will this always be a prefix search? Then you can try WHERE address BETWEEN '123 Main St' AND '123 Main Su'. That should produce a range scan on the index.Horick
The index should work fine for this query. Perhaps there is an incompatibility with the text types somehow that prevents the use of the index.Colorific
@Horick Thanks! Using BETWEEN produces the same results as the LIKE query and reduced the time down to 13ms. To answer your question, yes this will always be a prefix search. The only thing I don't like about this method is I'll have to come up with the next letter in the alphabet or the next number instead of using a wildcard symbol. Is there any other way to produce a range scan without having to write in that kind of logic?Cadre
Ideally, the query planner should have detected the prefix search and transformed it into a BETWEEN query. And even if not, worst case should have been an index scan (instead of a table scan) -- not that that would have been any faster in this case.Horick
C
5

This is an elaboration on the between method and too long for a comment.

If you are using standard ASCII characters, you can use the tilde-trick:

SELECT *
FROM addresses
WHERE address >= '123 Main St' AND
      address <= concat('123 Main St', '~');

Tilde has a larger ASCII value than other characters.

I do note that Postgres should use the index for the LIKE query as well. My guess is that the problem is something to do with compatibility of the types. Perhaps if you converted the pattern to a varchar(), Postgres would use the index.

Colorific answered 9/3, 2016 at 1:57 Comment(8)
Thanks for the reply. Pretty clever indeed but I'm running into trouble with that query: ERROR: argument of AND must be type boolean, not type text - on line 4. I'm using standard ASCII characters btw. Could you elaborate on what you mean by convert the pattern to a varchar()? Do you mean the column?Cadre
@Cadre . . . That might be due to operator precedence.Colorific
Removing the double pipe and adding the tilde, i.e. '123 Main St~', solves the error but returns no results. I also tried '123 Main S~' which didn't work. Referencing the ASCII table, it seems like 'z' is the largest ASCII value character that I can use to return any results. Edit: concat('123 Main St', 'z') works.Cadre
@Cadre . . . Interesting. The tilde has worked for me in other contexts. At least the z works, though.Colorific
Thanks for your help. I think the range scan with your concat suggestion is the best solution!Cadre
@GordonLinoff SELECT * FROM addresses WHERE address LIKE '%123 Main St%'; What if wild card is before and after the 123 Main St in this Case. What should I do to make the scan fasterPerri
@Perri . . . That would be a different question. You should investigate GIN and related indexes and if you need more help, ask a new question as a question.Colorific
For me z works out of the box. ~ works when I use ~>=~ instead of >=. It can also use a text_pattern_ops index, although z and >= is about as fast for me.Scarberry
S
4

Three things you could try:

  1. If your database is on 'C' locale (which you can check with \l on the psql prompt) then the regular Btree indexes should help in optimizing a LIKE 'abc%' type of query.
  2. If not, you could try using a suitable operator class when creating the Btree index. For e.g. CREATE INDEX tbl_col_text_pattern_ops_idx ON tbl(col text_pattern_ops);
  3. If that doesn't work, you could also try using the GiST / GIN, more detailing for which is given here.

If you'd like to know more you should read Erwin's StackOverflow answer here, that details how different Postgres indexes work with LIKE / ILIKE.

Saltant answered 9/3, 2016 at 5:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.