Chaining multiple filter() in Django, is this a bug?
Asked Answered
M

6

164

I always assumed that chaining multiple filter() calls in Django was always the same as collecting them in a single call.

# Equivalent
Model.objects.filter(foo=1).filter(bar=2)
Model.objects.filter(foo=1,bar=2)

but I have run across a complicated queryset in my code where this is not the case

class Inventory(models.Model):
    book = models.ForeignKey(Book)

class Profile(models.Model):
    user = models.OneToOneField(auth.models.User)
    vacation = models.BooleanField()
    country = models.CharField(max_length=30)

# Not Equivalent!
Book.objects.filter(inventory__user__profile__vacation=False).filter(inventory__user__profile__country='BR')
Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country='BR')

The generated SQL is

SELECT "library_book"."id", "library_book"."asin", "library_book"."added", "library_book"."updated" FROM "library_book" INNER JOIN "library_inventory" ON ("library_book"."id" = "library_inventory"."book_id") INNER JOIN "auth_user" ON ("library_inventory"."user_id" = "auth_user"."id") INNER JOIN "library_profile" ON ("auth_user"."id" = "library_profile"."user_id") INNER JOIN "library_inventory" T5 ON ("library_book"."id" = T5."book_id") INNER JOIN "auth_user" T6 ON (T5."user_id" = T6."id") INNER JOIN "library_profile" T7 ON (T6."id" = T7."user_id") WHERE ("library_profile"."vacation" = False  AND T7."country" = BR )
SELECT "library_book"."id", "library_book"."asin", "library_book"."added", "library_book"."updated" FROM "library_book" INNER JOIN "library_inventory" ON ("library_book"."id" = "library_inventory"."book_id") INNER JOIN "auth_user" ON ("library_inventory"."user_id" = "auth_user"."id") INNER JOIN "library_profile" ON ("auth_user"."id" = "library_profile"."user_id") WHERE ("library_profile"."vacation" = False  AND "library_profile"."country" = BR )

The first queryset with the chained filter() calls joins the Inventory model twice effectively creating an OR between the two conditions whereas the second queryset ANDs the two conditions together. I was expecting that the first query would also AND the two conditions. Is this the expected behavior or is this a bug in Django?

The answer to a related question Is there a downside to using ".filter().filter().filter()..." in Django? seems to indicated that the two querysets should be equivalent.

Mistress answered 17/11, 2011 at 9:17 Comment(0)
L
155

The way I understand it is that they are subtly different by design (and I am certainly open for correction): filter(A, B) will first filter according to A and then subfilter according to B, while filter(A).filter(B) will return a row that matches A 'and' a potentially different row that matches B.

Look at the example here:

https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

particularly:

Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects

...

In this second example (filter(A).filter(B)), the first filter restricted the queryset to (A). The second filter restricted the set of blogs further to those that are also (B). The entries select by the second filter may or may not be the same as the entries in the first filter.`

Lydie answered 17/11, 2011 at 9:36 Comment(12)
Does this mean that the link to the SO question is actually incorrect?Townley
This behavior, although documented, seems to violate the principle of least astonishment. Multiple filter()'s AND together when fields are on the same model, but then OR together when spanning relationships.Mistress
I believe you have it the wrong way around in the first paragraph - filter(A, B) is the AND situation ('lennon' AND 2008 in the docs), while filter(A).filter(B) is the OR situation ('lennon' OR 2008). This makes sense when you look at the queries generated in the question - the .filter(A).filter(B) case creates the joins twice, resulting in an OR.Unsightly
filter(A, B) is the AND filter(A).filter(B) is ORFundus
filter(A,B) doesn't work as intended when A and B are Q() objects containing AND or OR operators.Catinacation
so further restrict means less restrictive?Agist
This answer is incorrect. It is not "OR." This sentence "The second filter restricted the set of blogs further to those that are also (B)." clearly mentions "that are also (B)." If you observe a behavior similar to OR in this specific example, it does not necessarily mean that you can generalize your own interpretation. Please look at the answers by "Kevin 3112" and "Johnny Tsang." I believe those are the correct answers.Jaworski
Yep, it is documented in Django. But documenting bad design does not make it good design. There are certain situation where filters appear to be nested. For example, in DRF I add some filtering in get_queryset(self) method of a viewset and also use DRF filter backend out of the box. Filters appear to be nested and it works well if applied to the same model, but breaks (as designed) once filtering is done on a related model. It is definitely a violation of principle of least astonishment (at least).Marcionism
Django has a built-in Q objects that allows this kind of lookups Usage : from django.db.models import Q Item.objects.filter(Q(A),Q(B)); // A Or B Item.objects.filter(A,B); // A AND B Leatherjacket
@anouares-sayid I believe you are mistaken, Q(A) | Q(B) means OR.Incision
They're both ANDs, just applied in different places. The chained version filters books linked to a profile that has vacation=False AND linked to a profile that has country='BR'. The single filter version filters books linked to a profile that has vacation=False AND country='BR'.Mallorymallow
This answer is not correct, it depends if the target fields in the filters are relationships or not !!Sanitary
P
110

These two style of filtering are equivalent in most cases, but when query on objects base on ForeignKey or ManyToManyField, they are slightly different.

Examples from the documentation.

model
Blog to Entry is a one-to-many relation.

from django.db import models

class Blog(models.Model):
    ...

class Entry(models.Model):
    blog = models.ForeignKey(Blog)
    headline = models.CharField(max_length=255)
    pub_date = models.DateField()
    ...

objects
Assuming there are some blog and entry objects here.
enter image description here

queries

Blog.objects.filter(entry__headline_contains='Lennon', 
    entry__pub_date__year=2008)
Blog.objects.filter(entry__headline_contains='Lennon').filter(
    entry__pub_date__year=2008)  
    

For the 1st query (single filter one), it match only blog1.

For the 2nd query (chained filters one), it filters out blog1 and blog2.
The first filter restricts the queryset to blog1, blog2 and blog5; the second filter restricts the set of blogs further to blog1 and blog2.

And you should realize that

We are filtering the Blog items with each filter statement, not the Entry items.

So, it's not the same, because Blog and Entry are multi-valued relationships.

Reference: https://docs.djangoproject.com/en/1.8/topics/db/queries/#spanning-multi-valued-relationships
If there is something wrong, please correct me.

Edit: Changed v1.6 to v1.8 since the 1.6 links are no longer available.

Pentosan answered 31/1, 2015 at 16:3 Comment(2)
You seem to be mixed up between "matches" and "filters out". If you stuck to "this query returns" it would be a lot clearer.Chopstick
Very good schematic example which is clarifying the difference betweent the two.Fosterling
S
10

As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.

Example1 (same joined table) :

(example from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships)

Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)

This will give you all the Blogs that have one entry with both (entry_headline_contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query. Result: Book with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}

Example 2 (chained)

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry_headline_contains='Lennon') and then from the result filters (entry__pub_date__year=2008).

The difference is that it will also give you results like: Book with {entry.headline: 'Lennon', entry.pub_date: 2000}, {entry.headline: 'Bill', entry.pub_date: 2008}

In your case

I think it is this one you need:

Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country='BR')

And if you want to use OR please read: https://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q-objects

Spoliation answered 14/6, 2012 at 1:37 Comment(3)
The second example isn't actually true. All the chained filters are applied to the queried objects, i.e. they are ANDed together in the query.Quadragesimal
I believe that Example 2 is correct, and it is actually a explanation taken from the official Django documents, as referenced. I might not be the best explainer and I pardon for that. Example 1 is a direct AND as you would expect in a normal SQL writing. Example 1 gives something like this: 'SELECT blog JOIN entry WHERE entry.head_line LIKE "Lennon" AND entry.year == 2008 Example 2 gives something like this: 'SELECT blog JOIN entry WHERE entry.head_list LIKE "Lennon" UNION SELECT blog JOIN entry WHERE entry.head_list LIKE "Lennon"'Spoliation
Sir, you are quite right. In a hurry I missed the fact that our filtering criteria is pointing to a one-to-many relation, not to the blog itself.Quadragesimal
S
10

From Django docs :

To handle both of these situations, Django has a consistent way of processing filter() calls. Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects, but for multi-valued relations, they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.

  • It is clearly said that multiple conditions in a single filter() are applied simultaneously. That means that doing :
objs = Mymodel.objects.filter(a=True, b=False)

will return a queryset with raws from model Mymodel where a=True AND b=False.

  • Successive filter(), in some case, will provide the same result. Doing :
objs = Mymodel.objects.filter(a=True).filter(b=False)

will return a queryset with raws from model Mymodel where a=True AND b=False too. Since you obtain "first" a queryset with records which have a=True and then it's restricted to those who have b=False at the same time.

  • The difference in chaining filter() comes when there are multi-valued relations, which means you are going through other models (such as the example given in the docs, between Blog and Entry models). It is said that in that case (...) they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.

Which means that it applies the successives filter() on the target model directly, not on previous filter()

If I take the example from the docs :

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

remember that it's the model Blog that is filtered, not the Entry. So it will treat the 2 filter() independently.

It will, for instance, return a queryset with Blogs, that have entries that contain 'Lennon' (even if they are not from 2008) and entries that are from 2008 (even if their headline does not contain 'Lennon')

THIS ANSWER goes even further in the explanation. And the original question is similar.

Sanitary answered 3/12, 2020 at 0:34 Comment(0)
W
1

Sometimes you don't want to join multiple filters together like this:

def your_dynamic_query_generator(self, event: Event):
    qs \
    .filter(shiftregistrations__event=event) \
    .filter(shiftregistrations__shifts=False)

And the following code would actually not return the correct thing.

def your_dynamic_query_generator(self, event: Event):
    return Q(shiftregistrations__event=event) & Q(shiftregistrations__shifts=False)

What you can do now is to use an annotation count-filter.

In this case we count all shifts which belongs to a certain event.

qs: EventQuerySet = qs.annotate(
    num_shifts=Count('shiftregistrations__shifts', filter=Q(shiftregistrations__event=event))
)

Afterwards you can filter by annotation.

def your_dynamic_query_generator(self):
    return Q(num_shifts=0)

This solution is also cheaper on large querysets.

Hope this helps.

Wren answered 27/4, 2019 at 14:30 Comment(0)
G
-3

Saw this in a comment and I thought it was the simplest explanation.

filter(A, B) is the AND ; filter(A).filter(B) is OR

It's true if every linked model satisfies both conditions

Glutathione answered 26/12, 2020 at 13:33 Comment(2)
@Sanitary is this ever true?Recapture
It's true if every linked model satisfies both conditions ;)Mallorymallow

© 2022 - 2024 — McMap. All rights reserved.