Django conditional Subquery aggregate
Asked Answered
W

3

12

An simplified example of my model structure would be

class Corporation(models.Model):
    ...

class Division(models.Model):
    corporation = models.ForeignKey(Corporation)

class Department(models.Model):
    division = models.ForeignKey(Division)
    type = models.IntegerField()

Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.

class Corporation(models.Model):
    ...
    def get_departments_type_10(self):
        return (
            Department.objects
            .filter(division__corporation=self, type=10)
            .count()
        )

The problem here is that this absolutely murders performance due to the N+1 problem.

I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.

Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.

I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)

EDIT: Example of raw sql solution

corps = Corporation.objects.raw("""
SELECT
*,
(
    SELECT COUNT(*)
    FROM foo_division div ON div.corporation_id = c.id
    JOIN foo_department dept ON dept.division_id = div.id
    WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
Waikiki answered 25/6, 2018 at 14:11 Comment(0)
A
16

I think with Subquery we can get SQL similar to one you have provided, with this code

# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
    'division__corporation'
).annotate(count=Count('id')).order_by()

# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
    departments_of_type_10=Subquery(
        departments_subquery.values('count'), output_field=IntegerField()
    )
)

The generated SQL is

SELECT "corporation"."id", ... (other fields) ...,
  (
    SELECT COUNT("division"."id") AS "count"
    FROM "department"
    INNER JOIN "division" ON ("department"."division_id" = "division"."id") 
    WHERE (
      "department"."type" = 10 AND
      "division"."corporation_id" = ("corporation"."id")
    ) GROUP BY "division"."corporation_id"
  ) AS "departments_of_type_10"
FROM "corporation"

Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.

1. GROUP BY using .values and .annotate

2. order_by() problems

3. Subquery

Alcala answered 2/7, 2018 at 21:29 Comment(2)
A good answer. And a great example of the claim that the ORM is a leaky abstraction. The elegance certainly ends here, as raw SQL becomes more readable. The ORM still has the plus that it remains composable, which would be the main argument for this solution. ThanksWaikiki
@Waikiki Thanks. Yes, this is the sad truth. In some point of time I had to master such complicated for Django ORM things just because I needed them to be composable, with all that table aliasing and so on.. I would say for something like SQLAlchemy it shouldn't be that hard to express and still get all the ORM benefits. That's Django's not-the-best ORM design which completely fails on something more complex than WHERE or simple INNER/LOUTER JOIN without conditions.Alcala
D
4

You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:

from django.db.models import Case, IntegerField, Sum, When, Value

Corporation.objects.annotate(
    type_10_count=Sum(
        Case(
            When(division__department__type=10, then=Value(1)),
            default=Value(0),
            output_field=IntegerField()
        )
    )
)
Dangerfield answered 28/6, 2018 at 7:15 Comment(0)
I
1

I like the following way of doing it:

departments = Department.objects.filter(
    type=10,
    division__corporation=OuterRef('id')
).annotate(
    count=Func('id', 'Count')
).values('count').order_by()

corporations = Corporation.objects.annotate(
    departments_of_type_10=Subquery(depatments)
)

The more details on this method you can see in this answer: https://mcmap.net/q/159876/-django-1-11-annotating-a-subquery-aggregate

Iodine answered 2/9, 2021 at 13:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.