How does the performance of the following two query components compare?
LOWER LIKE
... LOWER(description) LIKE '%abcde%' ...
iLIKE
... description iLIKE '%abcde%' ...
How does the performance of the following two query components compare?
LOWER LIKE
... LOWER(description) LIKE '%abcde%' ...
iLIKE
... description iLIKE '%abcde%' ...
The answer depends on many factors like Postgres version, encoding and locale - LC_COLLATE
in particular.
The bare expression lower(description) LIKE '%abc%'
is typically a bit faster than description ILIKE '%abc%'
, and either is a bit faster than the equivalent regular expression: description ~* 'abc'
. This matters for sequential scans where the expression has to be evaluated for every tested row.
But for big tables like you demonstrate in your answer one would certainly use an index. For arbitrary patterns (not only left-anchored) I suggest a trigram index using the additional module pg_trgm
. Then we talk about milliseconds instead of seconds and the difference between the above expressions is nullified.
GIN and GiST indexes (using the gin_trgm_ops
or gist_trgm_ops
operator classes) support LIKE
(~~
), ILIKE
(~~*
), ~
, ~*
(and some more variants) alike. With a trigram GIN index on description
(typically bigger than GiST, but faster for reads), your query would use description ILIKE 'case_insensitive_pattern'
.
Related:
Basics for pattern matching in Postgres:
When working with said trigram index it's typically more practical to work with:
description ILIKE '%abc%'
Or with the case-insensitive regexp operator (without %
wildcards):
description ~* 'abc'
An index on (description)
does not support queries on lower(description)
like:
lower(description) LIKE '%abc%'
And vice versa.
With predicates on lower(description)
exclusively, the expression index is the slightly better option.
In all other cases, an index on (description)
is preferable as it supports both case-sensitive and -insensitive predicates.
LOWER
LIKE
is still about 17%
faster than iLIKE
(a drop from 25%
). –
Aerobiosis LOWER LIKE
is faster (unless a trigram index is added, in which case, as you said, there is no difference)... but the point is that LOWER LIKE
is what folks should use, not iLIKE
, which will either be equivalent or slower. –
Aerobiosis According to my tests (ten of each query), LOWER
LIKE
is about 17%
faster than iLIKE
.
Explanation
I created a million rows contain some random mixed text data:
require 'securerandom'
inserts = []
1000000.times do |i|
inserts << "(1, 'fake', '#{SecureRandom.urlsafe_base64(64)}')"
end
sql = "insert into books (user_id, title, description) values #{inserts.join(', ')}"
ActiveRecord::Base.connection.execute(sql)
Verify the number of rows:
my_test_db=# select count(id) from books ;
count
---------
1000009
(Yes, I have nine extra rows from other tests - not a problem.)
Example query and results:
my_test_db=# SELECT "books".* FROM "books" WHERE "books"."published" = 'f'
my_test_db=# and (LOWER(description) LIKE '%abcde%') ;
id | user_id | title | description | published
---------+---------+-------+----------------------------------------------------------------------------------------+------
1232322 | 1 | fake | 5WRGr7oCKABcdehqPKsUqV8ji61rsNGS1TX6pW5LJKrspOI_ttLNbaSyRz1BwTGQxp3OaxW7Xl6fzVpCu9y3fA | f
1487103 | 1 | fake | J6q0VkZ8-UlxIMZ_MFU_wsz_8MP3ZBQvkUo8-2INiDIp7yCZYoXqRyp1Lg7JyOwfsIVdpPIKNt1uLeaBCdelPQ | f
1817819 | 1 | fake | YubxlSkJOvmQo1hkk5pA1q2mMK6T7cOdcU3ADUKZO8s3otEAbCdEcmm72IOxiBdaXSrw20Nq2Lb383lq230wYg | f
Results for LOWER LIKE
my_test_db=# EXPLAIN ANALYZE SELECT "books".* FROM "books" WHERE "books"."published" = 'f' and (LOWER(description) LIKE '%abcde%') ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..32420.14 rows=1600 width=117) (actual time=938.627..4114.038 rows=3 loops=1)
Filter: ((NOT published) AND (lower(description) ~~ '%abcde%'::text))
Rows Removed by Filter: 1000006
Total runtime: 4114.098 ms
Results for iLIKE
my_test_db=# EXPLAIN ANALYZE SELECT "books".* FROM "books" WHERE "books"."published" = 'f' and (description iLIKE '%abcde%') ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..29920.11 rows=100 width=117) (actual time=1147.612..4986.771 rows=3 loops=1)
Filter: ((NOT published) AND (description ~~* '%abcde%'::text))
Rows Removed by Filter: 1000006
Total runtime: 4986.831 ms
Database info disclosure
Postgres version:
my_test_db=# select version();
version
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.4 on x86_64-apple-darwin12.4.0, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00), 64-bit
Collation setting:
my_test_db=# select datcollate from pg_database where datname = 'my_test_db';
datcollate
-------------
en_CA.UTF-8
Table definition:
my_test_db=# \d books
Table "public.books"
Column | Type | Modifiers
-------------+-----------------------------+-------------------------------------------------------
id | integer | not null default nextval('books_id_seq'::regclass)
user_id | integer | not null
title | character varying(255) | not null
description | text | not null default ''::text
published | boolean | not null default false
Indexes:
"books_pkey" PRIMARY KEY, btree (id)
LOWER
LIKE
is still about 17%
faster than iLIKE
(a drop from 25%
). –
Aerobiosis iLIKE
out performing LOWER
LIKE
? –
Aerobiosis In my rails Project. ILIKE
is almost 10x faster then LOWER LIKE
, I add a GIN
index on entities.name
column
> Entity.where("LOWER(name) LIKE ?", name.strip.downcase).limit(1).first
Entity Load (2443.9ms) SELECT "entities".* FROM "entities" WHERE (lower(name) like 'baidu') ORDER BY "entities"."id" ASC LIMIT $1 [["LIMIT", 1]]
> Entity.where("name ILIKE ?", name.strip).limit(1).first
Entity Load (285.0ms) SELECT "entities".* FROM "entities" WHERE (name ilike 'Baidu') ORDER BY "entities"."id" ASC LIMIT $1 [["LIMIT", 1]]
# explain analyze SELECT "entities".* FROM "entities" WHERE (name ilike 'Baidu') ORDER BY "entities"."id" ASC LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3186.03..3186.04 rows=1 width=1588) (actual time=7.812..7.812 rows=1 loops=1)
-> Sort (cost=3186.03..3187.07 rows=414 width=1588) (actual time=7.811..7.811 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 26kB
-> Bitmap Heap Scan on entities (cost=1543.21..3183.96 rows=414 width=1588) (actual time=7.797..7.805 rows=1 loops=1)
Recheck Cond: ((name)::text ~~* 'Baidu'::text)
Rows Removed by Index Recheck: 6
Heap Blocks: exact=7
-> Bitmap Index Scan on index_entities_on_name (cost=0.00..1543.11 rows=414 width=0) (actual time=7.787..7.787 rows=7 loops=1)
Index Cond: ((name)::text ~~* 'Baidu'::text)
Planning Time: 6.375 ms
Execution Time: 7.874 ms
(12 rows)
GIN index is really helpful to improve ILIKE
performance
name
does not support queries on lower(name)
. I mentioned that in my answer. That's most probably the reason for the 10x performance difference. (You'll see sequential scans instead of (bitmap) index scans. –
Shanonshanta © 2022 - 2024 — McMap. All rights reserved.