Slow count(*) on postgresql 9.2
Asked Answered
C

2

6

There are a few discussions this and there (including the official post on postgres web) about the slow count(*) prior version 9.2; somehow I did not find satisfied answer.

Basically I had postgres 9.1 installed, and I observed slow count(*) as simple as

select count(*) from restaurants;

on tables with records of 100k+. The average request is around 850ms. Well I assumed that that was the symptom people have been talking about for slow count on postgres 9.1 and below since postgres 9.2 has some new feature like index-only scan. I want to experiment this by using the same dataset from 9.1 and put it on 9.2. I call the count statement, and it still give a bad result as 9.1.

explain analyze select count(*) from restaurants;
------------------------------------------------------------------
Aggregate  (cost=23510.35..23510.36 rows=1 width=0) (actual time=979.960..979.961 rows=1 loops=1)
   ->  Seq Scan on restaurants  (cost=0.00..23214.88 rows=118188 width=0) (actual time=0.050..845.097 rows=118188 loops=1)
 Total runtime: 980.037 ms

Can anyone suggest feasible solution to this problem? Do I need to configure anything on postgres to enable the feature?

P.S. where clause doesn't help in my case either.

Cohin answered 13/12, 2012 at 7:50 Comment(4)
Did you read wiki.postgresql.org/wiki/Index-only_scans ? There's a discussion of count and limitations around it there. Is there a primary key on that table? Did you VACUUM and ANALYZE the table after loading the data?Lattonia
Also, what's your random_page_cost and seq_page_cost set to? How about effective_cache_size?Lattonia
Thank you Craig for the link. I read it up and did VACUUM as you suggested, and somehow it does the improve the speed from 850ms to 400ms. Still 400ms is an expensive time. Is there any extra tuning up methods to refine upon this?Cohin
What's the EXPLAIN (BUFFERS, ANALYZE) on the query after the VACUUM ANALYZE? And what's the output of SELECT avg(pg_column_size(restaurants)) FROM restaurants ?Lattonia
L
2

See the index only scans wiki entries:

In particular, I quote:

It is important to realise that the planner is concerned with minimising the total cost of the query. With databases, the cost of I/O typically dominates. For that reason, "count(*) without any predicate" queries will only use an index-only scan if the index is significantly smaller than its table. This typically only happens when the table's row width is much wider than some indexes'.

See also the discussion of VACUUM and ANALYZE for maintaining the visibility map. Essentially, you probably want to make VACUUM more aggressive, and you'll want to manually VACUUM ANALYZE the table after you first load it.

Lattonia answered 13/12, 2012 at 7:58 Comment(0)
S
0

This happens due to PostgreSQL's MVCC implementation. In short, in order to count the table rows, PostgreSQL needs to ensure that they exist. But given the multiple snapshots/versions of each record, PostgreSQL is unable to summarize the whole table directly. So instead PostgreSQL reads each row, performing a sequential scan.!

How to fix this?

There are different approaches to fix this issue, including a trigger-based mechanism. If it is acceptable for you to use an estimated count of the rows you can check PostgreSQL pg_class reltuples:

    SELECT reltuples::BIGINT AS estimate FROM pg_class WHERE relname=<table_name>

Reltuples:

[It is the] Number of rows in the table. This is only an estimate used by the planner. — PostgreSQL: Documentation: pg_class

More info:

https://medium.com/@vinicius_edipo/handling-slow-counting-with-elixir-postgresql-f5ff47f3d5b9

http://www.varlena.com/GeneralBits/120.php

https://www.postgresql.org/docs/current/static/catalog-pg-class.html

Seen answered 4/4, 2018 at 0:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.