Percentile calculation with a window function
Asked Answered
U

2

8

I know you can get the average, total, min, and max over a subset of the data using a window function. But is it possible to get, say, the median, or the 25th percentile instead of the average with the window function?

Put another way, how do I rewrite this to get the id and the 25th or 50th percentile sales numbers within each district rather than the average?

SELECT id, avg(sales)
    OVER (PARTITION BY district) AS district_average
FROM t
Utoaztecan answered 25/9, 2016 at 4:20 Comment(0)
L
16

You can write this as an aggregation function using percentile_cont() or percentile_disc():

select district, percentile_cont(0.25) within group (order by sales)
from t
group by district;

Unfortunately, Postgres doesn't currently support these as a window functions:

select id, percentile_cont(0.25) within group (order by sales) over (partition by district) 
from t;

So, you can use a join:

select t.*, p_25, p_75
from t join
     (select district,
             percentile_cont(0.25) within group (order by sales) as p_25,
             percentile_cont(0.75) within group (order by sales) as p_75
      from t
      group by district
     ) td
     on t.district = td.district
Linneman answered 25/9, 2016 at 4:25 Comment(2)
When I do this, I get the following error: OVER is not supported for ordered-set aggregate percentile_cont ...which is weird because I have Postgres 9.5, and I thought it became supported in 9.4?Utoaztecan
@StephenSmith . . . The documentation -- if you read it carefully -- is clear that the percentile functions are only aggregation functions and not window functions.Linneman
T
1

Another way to to this without joining as in Gordon's solution is by exploiting the array_agg function which can be used as a window function:

create function pg_temp.percentile_cont_window
(c double precision[], p double precision)
returns double precision
language sql
as
$$
with t1 as (select unnest(c) as x)
    select percentile_cont(p) WITHIN GROUP (ORDER BY x) from t1;
$$
;

-- -- -- -- -- -- -- -- --
-- Usage examples:

create temporary table t1 as (
          select 1 as g, 1 as x
    union select 1 as g, 2 as x
    union select 2 as g, 3 as x
);

-- Built-in function raises an error if used without group:
-- Error: OVER is not supported for ordered-set aggregate percentile_cont
select *, percentile_cont(.1) within group (order by x) over() from t1;

-- Built-in function with grouping
select g, percentile_cont(.1) within group (order by x) from t1 group by g;

-- |   g |   percentile_cont |
-- |----:|------------------:|
-- |   1 |               1.1 |
-- |   2 |               3   |

-- Custom function basic usage (note that this is without grouping)
select t1.*, pg_temp.percentile_cont_window(array_agg(x) over(), .1) from t1;

-- |   g |   x |   percentile_cont_window |
-- |----:|----:|-------------------------:|
-- |   1 |   2 |                      1.2 |
-- |   1 |   1 |                      1.2 |
-- |   2 |   3 |                      1.2 |


-- Custom function usage with grouping is the same as using the built-in percentile_cont function
select t1.g, pg_temp.percentile_cont_window(array_agg(x), .1) from t1 group by g;

-- |   g |   percentile_cont_window |
-- |----:|-------------------------:|
-- |   2 |                      3   |
-- |   1 |                      1.1 |

Troublemaker answered 5/8, 2022 at 15:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.