Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo
is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)
(see proof at the end of the answer).
Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.
Consider the following query :
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar ORDER BY foo)
FROM
foobar;
Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar)
FROM
foobar;
This is a big mistake, as it will take much longer. Proof :
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
Total runtime: 2458.969 ms
(6 lignes)
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
-> WindowAgg (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
Total runtime: 3060.041 ms
(7 lignes)
Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY
clause on a partition which doesn't need it by itself.
Here comes the WINDOW
syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE
) to the first query :
SELECT
array_agg(foo)
OVER qux,
avg(baz)
OVER qux
FROM
foobar
WINDOW
qux AS (PARTITION BY bar ORDER BY bar)
Post-warning update :
I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)
" seems questionable, so here is an example :
# SELECT * FROM foobar;
foo | bar
-----+-----
1 | 1
2 | 2
3 | 1
4 | 2
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
array_agg | avg
-----------+-----
{1,3} | 2
{1,3} | 2
{2,4} | 3
{2,4} | 3
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
array_agg | avg
-----------+-----
{1} | 1
{1,3} | 2
{2} | 2
{2,4} | 3
(4 lines)
PARTITION BY
increases ? – Cacodemon