Postgresql select count query takes long time
Asked Answered
O

2

21

I have a table named events in my Postgresql 9.5 database. And this table has about 6 million records.

I am runnig a select count(event_id) from events query. But this query takes 40seconds. This is very long time for a database. My event_id field of table is primary key and indexed. Why this takes very long time? (Server is ubuntu vm on vmware has 4cpu)

Explain:

"Aggregate  (cost=826305.19..826305.20 rows=1 width=0) (actual time=24739.306..24739.306 rows=1 loops=1)"
"  Buffers: shared hit=13 read=757739 dirtied=53 written=48"
"  ->  Seq Scan on event_source  (cost=0.00..812594.55 rows=5484255 width=0) (actual time=0.014..24087.050 rows=6320689 loops=1)"
"        Buffers: shared hit=13 read=757739 dirtied=53 written=48"
"Planning time: 0.369 ms"
"Execution time: 24739.364 ms"
Obara answered 6/3, 2019 at 8:53 Comment(3)
Edited post explain.Obara
757739 blocks for just 6 million rows seems quite a lot. That's roughly 8 rows per block. How many columns does the table have? Or does it have several long text columns (with less than 2k in them). Could also be table bloat. Does this change if you cleanup the table using vacuum full events;?Diller
There are 7 columns in the table. But one column has text type and so long json data.Obara
B
35

I know that this is an old question and the existing answer covers the vast majority of info around this, but I just ran into a situation where a table of 1.3 million rows was taking about 35 seconds to perform a simple SELECT COUNT(*). None of the other solutions helped. The issue ended up being that the table was just bloated and hadn't been vacuumed, so Postgres couldn't figure out the most optimal way to query the data. After I ran this, the query time dropped down to about 25ms!

VACUUM (ANALYZE, VERBOSE, FULL) my_table_name;

Hope this helps someone else!

Bushwhacker answered 26/6, 2020 at 12:43 Comment(6)
Alec, you’re a lifesaver! I spent hours trying to figure out why my COUNT queries were taking ~8 minutes to run. Turns out Postgres’ default vacuum feature hadn’t been enough and needed a manual vacuum.Caril
Glad to help :)Bushwhacker
@AlecSanger How much time this VACCUM command takes? I have three million records in my table. The query is running in the background so I'm not able to verify.Tubman
Vacuum could take a while depending on how much data is in there and what state it's in. I've had it take something like 10 minutes in the past. The next time you run it, though, it should be super quick which would confirm that the table was indeed vacuumed.Bushwhacker
This will lock your database! Potentially for a long time! For our table with 20M records, it took 4 minutes.Unicorn
Watch out for VERBOSE + log_statement value, as this might run your instance out of storage spaceMoniz
S
11

There are multiple factors playing a big role in the decision for PostgreSQL how to execute the count(), but first of all, the column you use inside the count function does not matter. In fact, if you don't need DISTINCT count, stick with count(*).

You can try the following to force an index-only scan:

SELECT count(*) FROM (SELECT event_id FROM events) t;

...if that still results in a sequential scan, than most likely the index is not much smaller than the table itself. To still see how an index-only scan would perform, you can enforce it with:

SELECT count(*) FROM (SELECT event_id FROM events ORDER BY 1) t;

IF that is not much faster, you should also consider an upgrade of the PostgreSQL to at least version 9.6, which introduces parallel sequential scans to speed up these things.

In addition, you can achieve dramatic speedups choosing from a variety of techniques to provide counts which largely depend on your use-case and your requirements:

Last but not least, please always provide the output of an extended explain as @a_horse_with_no_name already recommended, e.g.:

EXPLAIN (ANALYZE, BUFFERS) SELECT count(event_id) FROM events;
Sibyls answered 6/3, 2019 at 11:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.