How reliable is the cost measurement in PostgreSQL Explain Plan?
Asked Answered
P

1

9

The queries are performed on a large table with 11 million rows. I have already performed an ANALYZE on the table prior to the query executions.

Query 1:

SELECT *
FROM accounts t1
LEFT OUTER JOIN accounts t2 
    ON (t1.account_no = t2.account_no
        AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;

Explain Analyze:

Hash Anti Join  (cost=480795.57..1201111.40 rows=7369854 width=292) (actual time=29619.499..115662.111 rows=1977871 loops=1)
  Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
  Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
  ->  Seq Scan on accounts t1  (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.025..25693.921 rows=11034070 loops=1)
  ->  Hash  (cost=342610.81..342610.81 rows=11054781 width=146) (actual time=29612.925..29612.925 rows=11034070 loops=1)
        Buckets: 2097152  Batches: 1  Memory Usage: 1834187kB
        ->  Seq Scan on accounts t2  (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.006..22929.635 rows=11034070 loops=1)
Total runtime: 115870.788 ms

The estimated cost is ~1.2 million and the actual time taken is ~1.9 minutes.

Query 2:

SELECT t1.*
FROM accounts t1
LEFT OUTER JOIN accounts t2 
    ON (t1.account_no = t2.account_no
        AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;

Explain Analyze:

Hash Anti Join  (cost=480795.57..1201111.40 rows=7369854 width=146) (actual time=13365.808..65519.402 rows=1977871 loops=1)
  Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
  Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
  ->  Seq Scan on accounts t1  (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.007..5032.778 rows=11034070 loops=1)
  ->  Hash  (cost=342610.81..342610.81 rows=11054781 width=18) (actual time=13354.219..13354.219 rows=11034070 loops=1)
        Buckets: 2097152  Batches: 1  Memory Usage: 545369kB
        ->  Seq Scan on accounts t2  (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.011..8964.571 rows=11034070 loops=1)
Total runtime: 65705.707 ms

The estimated cost is ~1.2 million (again) but the actual time taken is <1.1 minutes.

Query 3:

SELECT *
FROM accounts
WHERE (account_no,
       effective_date) IN
    (SELECT account_no,
            max(effective_date)
     FROM accounts
     GROUP BY account_no);

Explain Analyze:

Nested Loop  (cost=406416.19..502216.84 rows=2763695 width=146) (actual time=31779.457..917543.228 rows=1977871 loops=1)
  ->  HashAggregate  (cost=406416.19..406757.45 rows=34126 width=43) (actual time=31774.877..33378.968 rows=1977425 loops=1)
        ->  Subquery Scan on "ANY_subquery"  (cost=397884.72..404709.90 rows=341259 width=43) (actual time=27979.226..29841.217 rows=1977425 loops=1)
              ->  HashAggregate  (cost=397884.72..401297.31 rows=341259 width=18) (actual time=27979.224..29315.346 rows=1977425 loops=1)
                    ->  Seq Scan on accounts  (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.851..16092.755 rows=11034070 loops=1)
  ->  Index Scan using accounts_idx2 on accounts  (cost=0.00..2.78 rows=1 width=146) (actual time=0.443..0.445 rows=1 loops=1977425)
        Index Cond: (((account_no)::text = ("ANY_subquery".account_no)::text) AND ((effective_date)::text = "ANY_subquery".max))
Total runtime: 918039.614 ms

The estimated cost is ~502,000 but the actual time taken is ~15.3 minutes!

  • How reliable is the EXPLAIN output?
  • Do we always have to EXPLAIN ANALYZE to see how our query is going to perform on real data, and not place trust on how much the query planner thinks it will cost?
Population answered 16/1, 2014 at 6:29 Comment(5)
Cost is an arbitrary number. Costs are only relative to each other, they have no units and no external meaning. You could estimate a rough conversion factor from query cost to execution time for your machine by comparing cost estimate to execution time for a bunch of queries, but that's about the only way. The reliability of cost estimates is based largely on how good a job the planner does, how up to date and detailed your table stats are, and whether you're encountering any known cost-estimation issues like correlated columns.Grabble
"You could estimate a rough conversion factor from query cost to execution time for your machine by comparing cost estimate to execution time for a bunch of queries" That rough conversion factor is totally useless in the above case. If I roughly estimate the cost to time conversion factor of queries 1 and 2, I would think the query 3 shouldn't take more than 45 seconds. But it takes more than 15 minutes? Why?Population
In other words, the cost appears to be highly misleading. If I trusted the cost, I would have picked query 3 over query 2, but the actual execution time shows that I should really be picking query 2 over query 3.Population
Planner's certainly mis-estimating that one. Hard to say why w/o more digging. Rowcount estimates are sane (see explain.depesz.com/s/4Jn). Looks like the index scan is taking way longer than Pg expects it to. random_page_cost too low to reflect reality? Things like this need some digging I'm afraid.Grabble
So costs aren't really a good estimate of query execution time. Not least because being within a factor of 10 or so is generally considered pretty good. They're planner cost parameters, for comparing alternative plans, and that's about it.Grabble
C
4

They are reliable, except for when they are not. You can't really generalize.

It looks like it is dramatically underestimating the number of different account_no that it will find (thinks it will find 34126 actually found 1977425). Your default_statistics_target might not be high enough to get a good estimate for this column.

Counterinsurgency answered 16/1, 2014 at 23:37 Comment(2)
That's a good tip! I guess one of the clues for when the estimate is bad is when the number of rows estimated does not come close enough to the actual number of rows.Population
"They are reliable, except for when they are not. You can't really generalize." This line is pretty helpful! I wonder, how this answer even got accepted.Sturrock

© 2022 - 2024 — McMap. All rights reserved.