LOWER LIKE vs iLIKE
Asked Answered
A

3

85

How does the performance of the following two query components compare?

LOWER LIKE

... LOWER(description) LIKE '%abcde%' ...

iLIKE

... description iLIKE '%abcde%' ...
Aerobiosis answered 2/12, 2013 at 19:55 Comment(3)
What is your goal? Do you have a slow SQL command that you want to speed it up or is it just a general question on PostgreSQL?Regnant
Stack Overflow hasn't (yet?) #hashtags (unless you are asking a C or C++ question).Ackley
@MartinStrejc my goal is just a general question on PostgreSQL. Presented with the choice of these two out-of-the-box solutions, I wanted to know which to use.Aerobiosis
S
84

The answer depends on many factors like Postgres version, encoding and locale - LC_COLLATE in particular.

The bare expression lower(description) LIKE '%abc%' is typically a bit faster than description ILIKE '%abc%', and either is a bit faster than the equivalent regular expression: description ~* 'abc'. This matters for sequential scans where the expression has to be evaluated for every tested row.

But for big tables like you demonstrate in your answer one would certainly use an index. For arbitrary patterns (not only left-anchored) I suggest a trigram index using the additional module pg_trgm. Then we talk about milliseconds instead of seconds and the difference between the above expressions is nullified.

GIN and GiST indexes (using the gin_trgm_ops or gist_trgm_ops operator classes) support LIKE (~~), ILIKE (~~*), ~, ~* (and some more variants) alike. With a trigram GIN index on description (typically bigger than GiST, but faster for reads), your query would use description ILIKE 'case_insensitive_pattern'.

Related:

Basics for pattern matching in Postgres:

When working with said trigram index it's typically more practical to work with:

description ILIKE '%abc%'

Or with the case-insensitive regexp operator (without % wildcards):

description ~* 'abc'

An index on (description) does not support queries on lower(description) like:

lower(description) LIKE '%abc%'

And vice versa.

With predicates on lower(description) exclusively, the expression index is the slightly better option.

In all other cases, an index on (description) is preferable as it supports both case-sensitive and -insensitive predicates.

Shanonshanta answered 2/12, 2013 at 20:46 Comment(5)
I will consider your suggestion, though I just wanted to clarify that my intent was to compare these two out-of-the-box solutions. I re-ran my tests and have updated my answer to reflect mixed-case data. I also added details about my database. It turns out that LOWER LIKE is still about 17% faster than iLIKE (a drop from 25%).Aerobiosis
Worth noting that the reported 17% were for sequential scans without index. I got 2% for a similar test, see other comment. And neither applies if a trigram index is added to the setup - which nullifies the difference.Shanonshanta
Thanks for the follow-up. Would you think it fair to add this comment to your answer? - I think that the selected answer should conclude that LOWER LIKE is faster (unless a trigram index is added, in which case, as you said, there is no difference)... but the point is that LOWER LIKE is what folks should use, not iLIKE, which will either be equivalent or slower.Aerobiosis
@user664833: Well, no, not generally. I clarified above.Shanonshanta
Doesn't the use of like vs ilike depend of how you stored your data? If you have 'Joe' in the DB, then you need to lower twice: `lower(input) LIKE lower('%Joe%'). That's why ILIKE was created I suppose...Tyree
A
39

According to my tests (ten of each query), LOWER LIKE is about 17% faster than iLIKE.

Explanation

I created a million rows contain some random mixed text data:

require 'securerandom'
inserts = []
1000000.times do |i|
        inserts << "(1, 'fake', '#{SecureRandom.urlsafe_base64(64)}')"
end
sql = "insert into books (user_id, title, description) values #{inserts.join(', ')}"
ActiveRecord::Base.connection.execute(sql)

Verify the number of rows:

my_test_db=# select count(id) from books ;
  count  
---------
 1000009

(Yes, I have nine extra rows from other tests - not a problem.)

Example query and results:

my_test_db=# SELECT "books".* FROM "books" WHERE "books"."published" = 'f'
my_test_db=# and (LOWER(description) LIKE '%abcde%') ;
   id    | user_id | title |                                      description                                       | published 
---------+---------+-------+----------------------------------------------------------------------------------------+------
 1232322 |       1 | fake  | 5WRGr7oCKABcdehqPKsUqV8ji61rsNGS1TX6pW5LJKrspOI_ttLNbaSyRz1BwTGQxp3OaxW7Xl6fzVpCu9y3fA | f
 1487103 |       1 | fake  | J6q0VkZ8-UlxIMZ_MFU_wsz_8MP3ZBQvkUo8-2INiDIp7yCZYoXqRyp1Lg7JyOwfsIVdpPIKNt1uLeaBCdelPQ | f
 1817819 |       1 | fake  | YubxlSkJOvmQo1hkk5pA1q2mMK6T7cOdcU3ADUKZO8s3otEAbCdEcmm72IOxiBdaXSrw20Nq2Lb383lq230wYg | f

Results for LOWER LIKE

my_test_db=# EXPLAIN ANALYZE SELECT "books".* FROM "books" WHERE "books"."published" = 'f' and (LOWER(description) LIKE '%abcde%') ;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on books  (cost=0.00..32420.14 rows=1600 width=117) (actual time=938.627..4114.038 rows=3 loops=1)
   Filter: ((NOT published) AND (lower(description) ~~ '%abcde%'::text))
   Rows Removed by Filter: 1000006
 Total runtime: 4114.098 ms

Results for iLIKE

my_test_db=# EXPLAIN ANALYZE SELECT "books".* FROM "books" WHERE "books"."published" = 'f' and (description iLIKE '%abcde%') ;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on books  (cost=0.00..29920.11 rows=100 width=117) (actual time=1147.612..4986.771 rows=3 loops=1)
   Filter: ((NOT published) AND (description ~~* '%abcde%'::text))
   Rows Removed by Filter: 1000006
 Total runtime: 4986.831 ms

Database info disclosure

Postgres version:

my_test_db=# select version();
                                                                                 version
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.4 on x86_64-apple-darwin12.4.0, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00), 64-bit

Collation setting:

my_test_db=# select datcollate from pg_database where datname = 'my_test_db';
 datcollate  
-------------
 en_CA.UTF-8

Table definition:

my_test_db=# \d books 
                                      Table "public.books"
   Column    |            Type             |                       Modifiers
-------------+-----------------------------+-------------------------------------------------------
 id          | integer                     | not null default nextval('books_id_seq'::regclass)
 user_id     | integer                     | not null
 title       | character varying(255)      | not null
 description | text                        | not null default ''::text
 published   | boolean                     | not null default false
Indexes:
    "books_pkey" PRIMARY KEY, btree (id)
Aerobiosis answered 2/12, 2013 at 19:55 Comment(7)
Your test case is one-sided, there are only upper-case letters in the data. Also, in a real-life application, you would be operating with indices, which changes the whole assessment. And essential details are undisclosed: Postgres version, your collation setting, your exact table definition.Shanonshanta
@ErwinBrandstetter I re-ran my tests and have updated my answer to reflect mixed-case data. I also added details about my database. LOWER LIKE is still about 17% faster than iLIKE (a drop from 25%).Aerobiosis
+1 Much better now. I wouldn't say "x is 17% faster than y", though, since this is only true for your particular test case. The length of the string is also relevant, btw.Shanonshanta
@ErwinBrandstetter - I realize that precise percentage results will vary but I also thought that just saying "x is faster than y" was too open ended. I think that with your comment folks who are sufficiently curious will get a more complete picture. By the way, are you aware of a condition of string length, collation setting, or something else, which would consistently result in iLIKE out performing LOWER LIKE?Aerobiosis
No. Not sure, though. I often see much closer results. Ran a quick test on a real-life table in Postgres 9.1 with 1.05M rows and real-life "descriptions", COLLATON de_AT.UTF-8, OS Debian Linux. LOWER / LIKE was ~ 2% faster.Shanonshanta
Curious what happens if you index description field. Can ILIKE take the same advantage of the index as LIKE? Normally I would not be searching a column without an index.Patina
@Speedy99: I addressed that in an update to my answer.Shanonshanta
N
3

In my rails Project. ILIKE is almost 10x faster then LOWER LIKE, I add a GIN index on entities.name column

> Entity.where("LOWER(name) LIKE ?", name.strip.downcase).limit(1).first
Entity Load (2443.9ms)  SELECT  "entities".* FROM "entities" WHERE (lower(name) like 'baidu') ORDER BY "entities"."id" ASC LIMIT $1  [["LIMIT", 1]]
> Entity.where("name ILIKE ?", name.strip).limit(1).first
Entity Load (285.0ms)  SELECT  "entities".* FROM "entities" WHERE (name ilike 'Baidu') ORDER BY "entities"."id" ASC LIMIT $1  [["LIMIT", 1]]
# explain analyze SELECT  "entities".* FROM "entities" WHERE (name ilike 'Baidu') ORDER BY "entities"."id" ASC LIMIT 1;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3186.03..3186.04 rows=1 width=1588) (actual time=7.812..7.812 rows=1 loops=1)
   ->  Sort  (cost=3186.03..3187.07 rows=414 width=1588) (actual time=7.811..7.811 rows=1 loops=1)
         Sort Key: id
         Sort Method: quicksort  Memory: 26kB
         ->  Bitmap Heap Scan on entities  (cost=1543.21..3183.96 rows=414 width=1588) (actual time=7.797..7.805 rows=1 loops=1)
               Recheck Cond: ((name)::text ~~* 'Baidu'::text)
               Rows Removed by Index Recheck: 6
               Heap Blocks: exact=7
               ->  Bitmap Index Scan on index_entities_on_name  (cost=0.00..1543.11 rows=414 width=0) (actual time=7.787..7.787 rows=7 loops=1)
                     Index Cond: ((name)::text ~~* 'Baidu'::text)
 Planning Time: 6.375 ms
 Execution Time: 7.874 ms
(12 rows)

GIN index is really helpful to improve ILIKE performance

Neurocoele answered 4/9, 2019 at 9:0 Comment(2)
An index on name does not support queries on lower(name). I mentioned that in my answer. That's most probably the reason for the 10x performance difference. (You'll see sequential scans instead of (bitmap) index scans.Shanonshanta
@ErwinBrandstetter Yes, thank you, I just get this in my project, so I put it here to show how index change the performance rule. :)Neurocoele

© 2022 - 2024 — McMap. All rights reserved.