Applying Multiple Window Functions On Same Partition
Asked Answered
Z

2

30

Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)

For example you can do

SELECT name, first_value() over (partition by name order by date) from table1

But is there a way to do something like:

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

Where we are applying two functions onto the same window?

Reference: http://postgresql.ro/docs/8.4/static/tutorial-window.html

Zabrine answered 13/12, 2009 at 10:25 Comment(0)
G
35

Can you not just use the window per selection

Something like

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

Also from your reference you can do it like this

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)
Gnatcatcher answered 13/12, 2009 at 10:29 Comment(4)
Is it still the most efficient query when the number of PARTITION BY increases ?Cacodemon
You need to use parameter in first_value() and last_value(). I guess it should be date.Impervious
@SkippyleGrandGourou according to the Postgres documentation, using the exact same PARTITION BY and ORDER BY clauses will guarantee that all window functions will use the same single pass over the data. postgresql.org/docs/9.1/…Decorticate
According to the documentation, the WINDOW clause "saves typing". 😆Libove
C
18

Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar) (see proof at the end of the answer).

Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.

Consider the following query :

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

This is a big mistake, as it will take much longer. Proof :

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY clause on a partition which doesn't need it by itself.

Here comes the WINDOW syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE) to the first query :

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

Post-warning update :

I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)" seems questionable, so here is an example :

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)
Cacodemon answered 1/10, 2014 at 12:13 Comment(7)
In the warning, it's said, "That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)." Why is it not?Angelinaangeline
I can hardly believe SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is NOT equivalent to SELECT avg(foo) OVER (PARTITION BY bar) as far we're discussing result set. Please prove, they differ. The execution time may of course vary (as you build two unnamed windows instead of one), but results are still the same.Mattison
@Mattison Just run the WINDOW command of my answer on this minimal example : create table foobar(foo float, bar int); insert into foobar values (1,1); insert into foobar values (3,1);, with and without the ORDER BY.Cacodemon
@Angelinaangeline I'm not into SQL anymore enough to reliably answer the why, but try the example in my previous comment (maybe add some lines to make it more obvious), the array_agg() output will give some hint.Cacodemon
@Skippy Damn, you've changed my understanding of PARTITION, thanks! (For those interested: with ORDER BY it returns rows of {1; 2}, and without it returns {2; 2}.Mattison
THANK YOU. I understand now. Here's another pretty good example: postgresql.org/docs/9.1/static/tutorial-window.html. Then, the ORDER BY sorts the rows within the window, and processes them in that order including only the current and previously seen rows, ignoring rows that are after current or not in the window.Angelinaangeline
AVG() applies to the window frame, not the partion. Adjust your frame to include the entire partition by adding a frame_clause of ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to your window_definition. A window_definition without an ORDER BY applies to the entire partition by default. See this db-fiddle.Libove

© 2022 - 2024 — McMap. All rights reserved.