Django & Postgres - percentile (median) and group by
Asked Answered
O

3

7

I need to calculate period medians per seller ID (see simplyfied model below). The problem is I am unable to construct the ORM query.

Model

class MyModel:
    period = models.IntegerField(null=True, default=None)
    seller_ids = ArrayField(models.IntegerField(), default=list)
    aux = JSONField(default=dict)

Query

queryset = (
    MyModel.objects.filter(period=25)
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
    .values("seller_id")
    .annotate(
        duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()),
        median=Func(
            F("duration"),
            function="percentile_cont",
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
        ),
    )
    .values("median", "seller_id")
)

ArrayField aggregation (seller_id) source


I think what I need to do is something along the lines below

select t.*, p_25, p_75
from t join
     (select district,
             percentile_cont(0.25) within group (order by sales) as p_25,
             percentile_cont(0.75) within group (order by sales) as p_75
      from t
      group by district
     ) td
     on t.district = td.district

above example source


Python 3.7.5, Django 2.2.8, Postgres 11.1

Ommiad answered 10/1, 2020 at 18:11 Comment(6)
To clarify, are you using django with SQLServer?Archaeornis
@Archaeornis there's a postgresql tag under the question so no.Capuano
yeah... sorry about thatArchaeornis
What is the error you have?Bryner
So what's your question? Whats wrong with the query you showed? Are you trying to do this using the ORM or?Primalia
Added model and example query.Capuano
G
19

You can create a Median child class of the Aggregate class as was done by Ryan Murphy (https://gist.github.com/rdmurphy/3f73c7b1826cacee34f6c2a855b12e2e). Median then works just like Avg:

    from django.db.models import Aggregate, FloatField


    class Median(Aggregate):
        function = 'PERCENTILE_CONT'
        name = 'median'
        output_field = FloatField()
        template = '%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)'

Then to find the median of a field use

    my_model_aggregate = MyModel.objects.all().aggregate(Median('period'))

which is then available as my_model_aggregate['period__median'].

Granulite answered 21/5, 2020 at 15:48 Comment(0)
O
3

Here's what did the trick.

from django.db.models import F, Func, IntegerField
from django.db.models.aggregates import Aggregate


queryset = (
    MyModel.objects.filter(period=25)
    .annotate(duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()))
    .filter(duration__isnull=False)
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
    .values("seller_id")  # group by
    .annotate(
        median=Aggregate(
            F("duration"),
            function="percentile_cont",
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
        ),
    )
)

Notice the median annotation employs Aggregate and not Func as in the question. Also, order of annotate() and filter() clauses as well as order of annotate() and values() clauses matters a lot!

BTW the resulting SQL is without a nested select and join.

Ommiad answered 10/1, 2020 at 23:1 Comment(0)
T
0

As of 2024 a good solution for adding Median and Percentile aggregate operators to Django is tailslide:

Tonytonya answered 3/9, 2024 at 23:12 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.