How to use percentile_conts with multiple quantiles in Postgres
Asked Answered
S

1

5

I currently have a query that works like so:

select AVG(t2 - t1) as delay,
       percentile_cont(0.25) within group (order by (t2 - t1)) as q25,
       percentile_cont(0.5) within group (order by (t2 - t1)) as median,
       percentile_cont(0.75) within group (order by (t2 - t1)) as q75,
       p.bool1,
       p.cat1
from people p
group by p.bool1, p.cat1
order by p.cat1,p.bool1

However, I read on the postgres functions aggregation page: https://www.postgresql.org/docs/9.4/functions-aggregate.html

That I should be able to specify multiple quantiles:

percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression)  double precision[]  double precision or interval    array of sort expression's type     multiple continuous percentile: returns an array of results matching the shape of the fractions parameter, with each non-null element replaced by the value corresponding to that percentile

I'd like to use this so I don't recalculate the t2 - t1 for every quantile. What's the right syntax to get multiple quantiles? Would I need a subquery?

Secularism answered 16/3, 2021 at 20:1 Comment(0)
R
12

I'd like to use this so I don't recalculate the t2 - t1 for every quantile

Lateral join could help in such scenario:

select AVG(t2 - t1) as delay,
       percentile_cont(0.25) within group (order by s.col) as q25,
       percentile_cont(0.5) within group (order by s.col) as median,
       percentile_cont(0.75) within group (order by s.col) as q75,
       p.bool1,
       p.cat1
from people p
,LATERAL(SELECT t2 - t1 AS col) s
group by p.bool1, p.cat1
order by p.cat1,p.bool1;

Related: PostgreSQL: using a calculated column in the same query


Array are defined as: ARRAY[0.25, 0.5, 0.75] or '{0.25, 0.5, 0.75}'::double precision[]

select AVG(t2 - t1) as delay,
   -- 1
   percentile_cont(ARRAY[0.25, 0.5, 0.75]) within group (order by (t2 - t1)) as q25,
   -- 2
   percentile_cont('{0.25, 0.5, 0.75}'::double precision[]) 
   within group (order by (t2 - t1)) as q
       p.bool1,
       p.cat1
from people p
group by p.bool1, p.cat1
order by p.cat1,p.bool1;

db<>fiddle demo


Is there an easy way to inline specify the names of each of the resulting percentile fields as they had been with q25, q50, q75, etc

WITH cte AS (
    select AVG(t2 - t1) as delay,
       percentile_cont(ARRAY[0.25, 0.5, 0.75]) within group (order by (t2 - t1)) as q,
           p.bool1,
           p.cat1
    from people p
    group by p.bool1, p.cat1
)
select cte.*, q[1] AS q25, q[2] AS q50, q[3] AS q75
from cte
order by cat1,bool1;

db<>fiddle demo2

Rozellarozelle answered 25/3, 2021 at 16:57 Comment(3)
Thanks! The array declaration was the key insight I needed. I didn't understand why it wasn't working with a variadic list of doubles!Secularism
Is there an easy way to inline specify the names of each of the resulting percentile fields as they had been with q25, q50, q75, etc?Secularism
@Secularism Yes, you could access each element of result array and add an aliasRozellarozelle

© 2022 - 2024 — McMap. All rights reserved.