Why PostgresQL count is so slow even with Index Only Scan
Asked Answered
R

1

7

I have a simple count query that can use Index Only Scan, but it still take so long in PostgresQL!

I have a cars table with 2 columns type bigint and active boolean, I also have a multi-column index on those columns

CREATE TABLE cars
(
id BIGSERIAL NOT NULL
    CONSTRAINT cars_pkey PRIMARY KEY ,
type BIGINT NOT NULL ,
name VARCHAR(500) NOT NULL ,
active            BOOLEAN DEFAULT TRUE NOT NULL,
created_at        TIMESTAMP(0) WITH TIME ZONE default NOW(),
updated_at        TIMESTAMP(0) WITH TIME ZONE default NOW(),
deleted_at        TIMESTAMP(0) WITH TIME ZONE
);
CREATE INDEX cars_type_active_index ON cars(type, active);

I inserted some test data with 950k records, type=1 have 600k records

INSERT INTO cars (type, name) (SELECT 1, 'car-name' FROM generate_series(1,600000));
INSERT INTO cars (type, name) (SELECT 2, 'car-name' FROM generate_series(1,200000));
INSERT INTO cars (type, name) (SELECT 3, 'car-name' FROM generate_series(1,100000));
INSERT INTO cars (type, name) (SELECT 4, 'car-name' FROM generate_series(1,50000));

Let 's run VACUUM ANALYZE and force PostgresQL to use Index Only Scan

VACUUM ANALYSE;
SET enable_seqscan = OFF;
SET enable_bitmapscan = OFF;

OK, I have a simple query on type and active

EXPLAIN (VERBOSE, BUFFERS, ANALYSE) 
SELECT count(*) 
FROM cars 
WHERE type = 1 AND active = true;

Result:

Aggregate  (cost=24805.70..24805.71 rows=1 width=0) (actual time=4460.915..4460.918 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=2806
->  Index Only Scan using cars_type_active_index on public.cars (cost=0.42..23304.23 rows=600590 width=0) (actual time=0.051..2257.832 rows=600000 loops=1)
        Output: type, active
        Index Cond: ((cars.type = 1) AND (cars.active = true))
        Filter: cars.active
        Heap Fetches: 0
        Buffers: shared hit=2806
Planning time: 0.213 ms
Execution time: 4461.002 ms
(11 rows)

Look at the query explain result,

  • It used Index Only Scan, with index only scan, depending on visibilities map, PostgresQL sometime need to fetch Table Heap to check for visibility of the tuple, But I already run VACUUM ANALYZE so you can see Heap fetch = 0, so reading the index is enough for answer this query.

  • The size of the index is quite small, it can all fit on the Buffer cache (Buffers: shared hit=2806), PostgresQL does not need to fetch pages from disk.

From there, I can't understand why PostgresQL take that long (4.5s) to answer the query, 1M records is not a big number of records, everything is already cached on memory, and the data on index is visible, it does not need to fetch Heap.

PostgreSQL 9.5.10 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4enter code here.9.2-10) 4.9.2, 64-bit

I tested it on docker 17.09.1-ce, Macbook pro 2015.

I am still new to PostgresQL and trying to map my knowledge with the real cases. Thanks so much,

Rafiq answered 29/4, 2018 at 5:12 Comment(3)
A partial index might be more efficient for this query: CREATE INDEX cars_type_active_index ON cars(type) where active;Xylograph
yes, create a partial index can help a little bit but not much, because almost 99% of the table is active. But I am more curious about why it Index Only Scan is slow like that, when everything is already cache on memory and data is all new.Rafiq
Here is the explain result with partial index: Aggregate (cost=24099.88..24099.89 rows=1 width=0) (actual time=4565.551..4565.554 rows=1 loops=1) Output: count(*) Buffers: shared hit=7250 -> Index Scan using cars_type_on_active_index on public.cars (cost=0.42..22606.33 rows=597423 width=0) (actual time=0.025..2324.765 rows=600000 loops=1) Output: id, type, name, active, created_at, updated_at, deleted_at Index Cond: (cars.type = 1) Buffers: shared hit=7250 Planning time: 0.095 ms Execution time: 4565.599 msRafiq
R
0

It seems like I found the reason, it not about PostgresQL problems, it 's because of running in docker. When I run directly in my mac, the time will be around 100ms which is fast enough.

Another thing I figured out is the reason why PostgresQL still use seq scan instead of index only scan (that why I have to disable seq_scan and bitmapscan in my test):

  • The size of table is not so big compare to the size of the index, if I add more columns to the table or length of columns is longer, the bigger size of the table, the more chance index can be use.
  • random_page_cost value by default is 4, my disk is quite fast so I can set it to 1-2, it will help the psql's explainer estimate cost more correctly.
Rafiq answered 25/9, 2018 at 4:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.