Configuration parameter work_mem in PostgreSQL on Linux
Asked Answered
C

2

18

I have to optimize queries by tuning basic PostgreSQL server configuration parameters. In documentation I've came across the work_mem parameter. Then I checked how changing this parameter would influence performance of my query (using sort). I measured query execution time with various work_mem settings and was very disappointed.

The table on which I perform my query contains 10,000,000 rows and there are 430 MB of data to sort. (Sort Method: external merge Disk: 430112kB).

With work_mem = 1MB, EXPLAIN output is:

Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort  (cost=4032588.78..4082588.66 rows=19999954 width=8) 
(actual time=22577.149..26424.951 rows=20000000 loops=1)
                 Sort Key: "*SELECT* 1".n
                 Sort Method:  external merge  Disk: 430104kB

With work_mem = 5MB:

Total runtime: 36282.729 ms (sort: 25400 ms).
Sort  (cost=3485713.78..3535713.66 rows=19999954 width=8) 
      (actual time=25062.383..33246.561 rows=20000000 loops=1)
      Sort Key: "*SELECT* 1".n
      Sort Method:  external merge  Disk: 430104kB

With work_mem = 64MB:

Total runtime: 42566.538 ms (sort: 31000 ms).
Sort  (cost=3212276.28..3262276.16 rows=19999954 width=8) 
(actual time=28599.611..39454.279 rows=20000000 loops=1)
                 Sort Key: "*SELECT* 1".n
                 Sort Method:  external merge  Disk: 430104kB

Can anyone explain why performance gets worse? Or suggest any other methods to makes queries execution faster by changing server parameters?

My query (I know it's not optimal, but I have to benchmark this kind of query):

SELECT n
FROM   (
    SELECT n + 1 AS n FROM table_name
    EXCEPT
    SELECT n FROM table_name) AS q1
ORDER BY n DESC;

Full execution plan:

Sort  (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method:  quicksort  Memory: 25kB
->  Subquery Scan q1  (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
    ->  SetOp Except  (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
           ->  Sort  (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
                 Sort Key: "*SELECT* 1".n
                 Sort Method:  external merge  Disk: 430104kB
                 ->  Append  (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
                       ->  Subquery Scan "*SELECT* 1"  (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
                             ->  Seq Scan on table_name  (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
                       ->  Subquery Scan "*SELECT* 2"  (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
                             ->  Seq Scan on table_name  (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms
Contraoctave answered 12/11, 2011 at 17:27 Comment(4)
Is there another merge in one of the subqueries, that shifts from external merge or nested loop or index loop to hashmap when you increase workmem?Appleton
I've edited my post and included query and execution plan.Contraoctave
Your query does not match the EXPLAIN ANALYZE output. You make this harder than it needs to be. Also, you might want to know: only the OP is alerted to a comment automatically. Others you'll have to address explicitly like this @Grzes. But some limitations apply. Read more here: meta.stackexchange.com/questions/43019/…Stagy
@Erwin: It doesn't match because I changed table name and parameter name in query. (I will correct it). But the query plan is relevant to the query.Contraoctave
S
15

I posted your query plan on explain.depesz.com, have a look.

The query planner's estimates are terribly wrong in some places. Have you run ANALYZE recently?

Read the chapters in the manual on Statistics Used by the Planner and Planner Cost Constants. Pay special attention to the chapters on random_page_cost and default_statistics_target.
You might try:

ALTER TABLE diplomas ALTER COLUMN number SET STATISTICS 1000;
ANALYZE diplomas;

Or go even a higher for a table with 10M rows. It depends on data distribution and actual queries. Experiment. Default is 100, maximum is 10000.

For a database of that size, only 1 or 5 MB of work_mem are generally not enough. Read the Postgres Wiki page on Tuning Postgres that @aleroot linked to.

As your query needs 430104kB of memory on disk according to EXPLAIN output, you have to set work_mem to something like 500MB or more to allow in-memory sorting. In-memory representation of data needs some more space than on-disk representation. You may be interested in what Tom Lane posted on that matter recently.

Increasing work_mem by just a little, like you tried, won't help much or can even slow down. Setting it to high globally can even hurt, especially with concurrent access. Multiple sessions might starve one another for resources. Allocating more for one purpose takes away memory from another if the resource is limited. The best setup depends on the complete situation.

To avoid side effects, only set it high enough locally in your session, and temporarily for the query:

SET work_mem = '500MB';

Reset it to your default afterwards:

RESET work_mem;

Or use SET LOCAL to set it just for the current transaction to begin with.

Stagy answered 13/11, 2011 at 1:9 Comment(3)
Yes @Erwin, I have run VACUUM ANALYZE. Statistics are up to date. I also had executed the query with work_mem = 450MB (19,5s instead of 30s) before you wrote your post. But such a huge work_mem value maybe dangerous. I've read that there can be executed lots of parallel operations (sorts, hashes), so the total needed memory cost could be n * 500MB and may exceed the amount of ram memory. Thanks for links.Contraoctave
@Contraoctave If you only set work_mem for the transaction with your query like I advised, you can control how much memory is used. All other operations will keep the default setting. Make it 500MB or more, 450MB are probably not quite enough.Stagy
oh, I have just woke up :) Maybe that's why I didn't notice "only temporarily for this query". Thank you.Contraoctave
A
1
SET search_path='tmp';
-- Generate some data ...
-- DROP table tmp.table_name ;
-- CREATE table tmp.table_name ( n INTEGER NOT NULL PRIMARY KEY);
-- INSERT INTO tmp.table_name(n) SELECT generate_series(1,1000);
-- DELETE FROM tmp.table_name WHERE random() < 0.05 ;

The except query is equivalent to the following NOT EXISTS form, which generates a different query plan (but the same results) here ( 9.0.1beta something)

-- EXPLAIN ANALYZE
WITH q1 AS (
    SELECT 1+tn.n  AS n
    FROM table_name tn
    WHERE NOT EXISTS (
        SELECT * FROM table_name nx
        WHERE nx.n = tn.n+1
        )   
    )
SELECT q1.n
FROM q1
ORDER BY q1.n DESC;

(a version with a recursive CTE might also be possible :-)

EDIT: the query plans. all for 100K records with 0.2 % deleted

Original query:

    ------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=36461.76..36711.20 rows=99778 width=4) (actual time=2682.600..2682.917 rows=222 loops=1)
   Sort Key: q1.n
   Sort Method:  quicksort  Memory: 22kB
   ->  Subquery Scan q1  (cost=24984.41..26979.97 rows=99778 width=4) (actual time=2003.047..2682.036 rows=222 loops=1)
         ->  SetOp Except  (cost=24984.41..25982.19 rows=99778 width=4) (actual time=2003.042..2681.389 rows=222 loops=1)
               ->  Sort  (cost=24984.41..25483.30 rows=199556 width=4) (actual time=2002.584..2368.963 rows=199556 loops=1)
                     Sort Key: "*SELECT* 1".n
                     Sort Method:  external merge  Disk: 3512kB
                     ->  Append  (cost=0.00..5026.57 rows=199556 width=4) (actual time=0.071..1452.838 rows=199556 loops=1)
                           ->  Subquery Scan "*SELECT* 1"  (cost=0.00..2638.01 rows=99778 width=4) (actual time=0.067..470.652 rows=99778 loops=1)
                                 ->  Seq Scan on table_name  (cost=0.00..1640.22 rows=99778 width=4) (actual time=0.063..178.365 rows=99778 loops=1)
                           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..2388.56 rows=99778 width=4) (actual time=0.014..429.224 rows=99778 loops=1)
                                 ->  Seq Scan on table_name  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.011..143.320 rows=99778 loops=1)
 Total runtime: 2684.840 ms
(14 rows)

NOT EXISTS-version with CTE:

----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6394.60..6394.60 rows=1 width=4) (actual time=699.190..699.498 rows=222 loops=1)
   Sort Key: q1.n
   Sort Method:  quicksort  Memory: 22kB
   CTE q1
     ->  Hash Anti Join  (cost=2980.01..6394.57 rows=1 width=4) (actual time=312.262..697.985 rows=222 loops=1)
           Hash Cond: ((tn.n + 1) = nx.n)
           ->  Seq Scan on table_name tn  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.013..143.210 rows=99778 loops=1)
           ->  Hash  (cost=1390.78..1390.78 rows=99778 width=4) (actual time=309.923..309.923 rows=99778 loops=1)
                 ->  Seq Scan on table_name nx  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..144.102 rows=99778 loops=1)
   ->  CTE Scan on q1  (cost=0.00..0.02 rows=1 width=4) (actual time=312.270..698.742 rows=222 loops=1)
 Total runtime: 700.040 ms
(11 rows)

NOT EXISTS-version without CTE

--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6394.58..6394.58 rows=1 width=4) (actual time=692.313..692.625 rows=222 loops=1)
   Sort Key: ((1 + tn.n))
   Sort Method:  quicksort  Memory: 22kB
   ->  Hash Anti Join  (cost=2980.01..6394.57 rows=1 width=4) (actual time=308.046..691.849 rows=222 loops=1)
         Hash Cond: ((tn.n + 1) = nx.n)
         ->  Seq Scan on table_name tn  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.014..142.781 rows=99778 loops=1)
         ->  Hash  (cost=1390.78..1390.78 rows=99778 width=4) (actual time=305.732..305.732 rows=99778 loops=1)
               ->  Seq Scan on table_name nx  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..143.783 rows=99778 loops=1)
 Total runtime: 693.139 ms
(9 rows)

My conclusion is that the "NOT EXISTS" versions cause postgres to produce better plans.

Appleton answered 13/11, 2011 at 10:58 Comment(4)
Interesting idea to substitute NOT EXISTS. But why the CTE? You could it all at the same query level. I mean, CTE are cool and everything, but performance beats style. :)Stagy
Just because I can! (also because of its resemblance to the original) But the different queryplan suggests that the original might have been sub optimal. (this is also the case for subqueries involving aggregates; NOT EXISTS is one of my standard work-arounds)Appleton
BTW: I think this post should be retagged islands-and-gaps.Appleton
Avoiding the sort in the inner loop might pay off even better when scaling up, IMO.Appleton

© 2022 - 2024 — McMap. All rights reserved.