Django: select_related on a very simple relationship is not eliminating the N+1 problem
Asked Answered
B

1

0

I have a strange situation. I've done so much reading on avoiding the N+1 problem and have tried prefetching but to no avail. My setup is like this:

Models:

class A(models.Model):
    somefield1 = models.CharField(max_length=100)
    somefield2 = models.CharField(max_length=100)

    # Assume that classes Y and Z exist. We don't care about them.
    y = models.ForeignKey(Y, on_delete=models.CASCADE)
    z = models.ForeignKey(Z, on_delete=models.CASCADE)

class B(models.Model)
    a = models.ForeignKey(A, on_delete=models.CASCADE, related_name="b_objs")

Visually:

B --> A --> Y and Z (Again, we don't care about A's FKs to Y and Z)

There are 188 A objects and 1291 B objects pointing to some combination of those 188 As.

I do a simple query like so:

list(B.objects.select_related('a'))

When I log my django DB queries (https://mcmap.net/q/151293/-how-to-log-all-sql-queries-in-django), I see over 1000 calls to select individual elements of A like so (most likely because we have 1291 B objects, each of which points to one A object):

SELECT "A"."id",
       "A"."somefield1",
       "A"."somefield2",
       "A"."y_id",
       "A"."z_id"
       FROM "A"
       WHERE "A"."id" = 109 LIMIT 21; args=(109,); alias=default

The value of A.id (109) changes every query, presumably for each B object that points to one A object.

The select_related is doing something because I see the correct INNER JOIN in the first query to connect A and B, as well as get all of A's fields:

SELECT "B"."id",
       "B"."a_id",
       "A"."id",
       "A"."somefield1",
       "A"."somefield2",
       "A"."y_id",
       "A"."z_id"
       FROM "A" INNER JOIN "B" ON ("B"."a_id" = "A"."id")

In fact, this query returns everything that we would hypothetically need to build all the models! It returns 1291 rows, which is correct.

Futhermore, when I print

print(B.objects.select_related('a').query)

It only returns that one query above, not the 1291 other ones.

Where are each of the individual queries for each A object coming from then?

And does anyone know why calling select_related on a doesn't prevent these thousands of individual queries for A from happening?

Doing something more extreme like

list(B.objects.select_related())

or even

list(B.objects.select_related().prefetch_related())

to follow all foreign keys and relations only makes the first query with the INNER JOIN much more complicated, but it still results in N queries for each of the 1291 A objects pointing to B objects.

Burmeister answered 30/9, 2022 at 21:37 Comment(0)
B
0

It turned out that my select_related was correct. The culprit was a post_init receiver on B that was calling self.b.a, which resulted in the individual queries for each A object every time a B object was instantiated. Painful lesson learned.

For those facing similar issues, I was able to debug this using this amazing SQL stacktrace tool which prints a python stacktrace of where the query is being generated: https://github.com/dobarkod/django-queryinspect.

Burmeister answered 2/10, 2022 at 6:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.