Django filter queryset __in for *every* item in list
Asked Answered
H

9

141

Let's say I have the following models

class Photo(models.Model):
    tags = models.ManyToManyField(Tag)

class Tag(models.Model):
    name = models.CharField(max_length=50)

In a view I have a list with active filters called categories. I want to filter Photo objects which have all tags present in categories.

I tried:

Photo.objects.filter(tags__name__in=categories)

But this matches any item in categories, not all items.

So if categories would be ['holiday', 'summer'] I want Photo's with both a holiday and summer tag.

Can this be achieved?

Heterogamete answered 23/12, 2011 at 16:1 Comment(5)
Maybe: qs=Photo.objects.all(); for category in categories: qs = qs.filter(tags__name=category)Diastole
jpic is right, Photo.objects.filter(tags__name='holiday').filter(tags__name='summer') is the way to go. (This is same as jpic's example). Each filter should add more JOINs to query, so you could take annotation approach if they are too many.Geomancy
Here's the reference in the docs: docs.djangoproject.com/en/dev/topics/db/queries/…Eozoic
You would expect there to be a build-in function for this by DjangoJaquelinejaquelyn
Photo.objects.filter(tags__name_in=['holiday','summer']) won't work?? and even to avoid unnecessary data hit, Photo.objects.filter(tags_name__in=['holiday','summer']).select_related('tag').Statuary
G
166

Summary:

One option is, as suggested by jpic and sgallen in the comments, to add .filter() for each category. Each additional filter adds more joins, which should not be a problem for small set of categories.

There is the aggregation approach. This query would be shorter and perhaps quicker for a large set of categories.

You also have the option of using custom queries.


Some examples

Test setup:

class Photo(models.Model):
    tags = models.ManyToManyField('Tag')

class Tag(models.Model):
    name = models.CharField(max_length=50)

    def __unicode__(self):
        return self.name

In [2]: t1 = Tag.objects.create(name='holiday')
In [3]: t2 = Tag.objects.create(name='summer')
In [4]: p = Photo.objects.create()
In [5]: p.tags.add(t1)
In [6]: p.tags.add(t2)
In [7]: p.tags.all()
Out[7]: [<Tag: holiday>, <Tag: summer>]

Using chained filters approach:

In [8]: Photo.objects.filter(tags=t1).filter(tags=t2)
Out[8]: [<Photo: Photo object>]

Resulting query:

In [17]: print Photo.objects.filter(tags=t1).filter(tags=t2).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_photo_tags" T4 ON ("test_photo"."id" = T4."photo_id")
WHERE ("test_photo_tags"."tag_id" = 3  AND T4."tag_id" = 4 )

Note that each filter adds more JOINS to the query.

Using annotation approach:

In [29]: from django.db.models import Count
In [30]: Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2)
Out[30]: [<Photo: Photo object>]

Resulting query:

In [32]: print Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2).query
SELECT "test_photo"."id", COUNT("test_photo_tags"."tag_id") AS "num_tags"
FROM "test_photo"
LEFT OUTER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
WHERE ("test_photo_tags"."tag_id" IN (3, 4))
GROUP BY "test_photo"."id", "test_photo"."id"
HAVING COUNT("test_photo_tags"."tag_id") = 2

ANDed Q objects would not work:

In [9]: from django.db.models import Q
In [10]: Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer'))
Out[10]: []
In [11]: from operator import and_
In [12]: Photo.objects.filter(reduce(and_, [Q(tags__name='holiday'), Q(tags__name='summer')]))
Out[12]: []

Resulting query:

In [25]: print Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer')).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_tag" ON ("test_photo_tags"."tag_id" = "test_tag"."id")
WHERE ("test_tag"."name" = holiday  AND "test_tag"."name" = summer )
Geomancy answered 26/12, 2011 at 17:54 Comment(8)
Is there be a solution with a custom lookup? docs.djangoproject.com/en/1.10/howto/custom-lookups It would cool to switch "__in" to "__all" and have it create the correct sql query.Bodwell
This annotation solution seems wrong. What if there are three tags possible (lets call the additional one for t3, and a photo has the tags t2 and t3. Then this photo will still match the given query.Lait
@Lait I think the idea is that you would replace num_tags=2 with num_tags=len(tags); I expect the hard-coded 2 was just for example's sake.Trivalent
@Trivalent It still would not work. Photo.objects.filter(tags__in=tags) matches photos that have any of the tags, not only those that has all. Some of those that only has one of the desired tags, may have exactly the amount of tags that you are looking for, and some of those that has all the desired tags, may also have additional tags.Lait
If your Django version is 1.8 or above you could use conditional annotations. See this answer: https://mcmap.net/q/167887/-how-to-annotate-count-with-a-condition-in-a-django-querysetLait
@Lait the annotation counts only the tags returned by the query, so if (num tags returned by the query) == (num tags searched for) then the row is included; "extra" tags are not searched for, so won't be counted. I've verified this within my own app.Trivalent
Is it possible that Q objects would work but OR-ed instead ?Widower
The annotation approach is both elegant and efficient. I would suggest placing it higher in the answer!Hedwighedwiga
H
10

Another approach that works, although PostgreSQL only, is using django.contrib.postgres.fields.ArrayField:

Example copied from docs:

>>> Post.objects.create(name='First post', tags=['thoughts', 'django'])
>>> Post.objects.create(name='Second post', tags=['thoughts'])
>>> Post.objects.create(name='Third post', tags=['tutorial', 'django'])

>>> Post.objects.filter(tags__contains=['thoughts'])
<QuerySet [<Post: First post>, <Post: Second post>]>

>>> Post.objects.filter(tags__contains=['django'])
<QuerySet [<Post: First post>, <Post: Third post>]>

>>> Post.objects.filter(tags__contains=['django', 'thoughts'])
<QuerySet [<Post: First post>]>

ArrayField has some more powerful features such as overlap and index transforms.

Heterogamete answered 11/1, 2017 at 13:34 Comment(1)
It's a powerful solution, but if you created your models before knowing about the " __in for every item in list" use case, you're probably not using ArrayField and will probably have to do some deep refactoring...Hedwighedwiga
I
7

This also can be done by dynamic query generation using Django ORM and some Python magic :)

from operator import and_
from django.db.models import Q

categories = ['holiday', 'summer']
res = Photo.filter(reduce(and_, [Q(tags__name=c) for c in categories]))

The idea is to generate appropriate Q objects for each category and then combine them using AND operator into one QuerySet. E.g. for your example it'd be equal to

res = Photo.filter(Q(tags__name='holiday') & Q(tags__name='summer'))
Ivie answered 26/12, 2011 at 15:0 Comment(6)
This would not work. Your query examples would not return anything for the models in question.Geomancy
Thanks for correction. I thought chaining filter would be the same as using and for Q objects in one filter... My mistake.Ivie
No worries, my first thought where also Q objects.Geomancy
This would we slower if you work with large tables and large data to compare to. (like 1 Million each)Invertebrate
This approach should work if you switch from filter to exclude and use a negate operator. Like so: res = Photo.exclude(~reduce(and_, [Q(tags__name=c) for c in categories]))Finisterre
As it is, this answer won't return any instances (it tries to find categories whose names are at the same time both "summer" and "holiday", and it will fail). Please either remove it or adjust it :-)Hedwighedwiga
B
3

I use a little function that iterates filters over a list for a given operator an a column name :

def exclusive_in (cls,column,operator,value_list):         
    myfilter = column + '__' + operator
    query = cls.objects
    for value in value_list:
        query=query.filter(**{myfilter:value})
    return query  

and this function can be called like that:

exclusive_in(Photo,'tags__name','iexact',['holiday','summer'])

it also work with any class and more tags in the list; operators can be anyone like 'iexact','in','contains','ne',...

Blueberry answered 11/11, 2018 at 18:55 Comment(0)
S
3

If you struggled with this problem as i did and nothing mentioned helped you, maybe this one will solve your issue

Instead of chaining filter, in some cases it would be better just to store ids of previous filter

tags = [1, 2]
for tag in tags:
    ids = list(queryset.filter(tags__id=tag).values_list("id", flat=True))
    queryset = queryset.filter(id__in=ids)

Using this approach will help you to avoid stacking JOIN in SQL query:

Seitz answered 25/1, 2021 at 11:6 Comment(0)
P
0

My solution: let say author is list of elements that need to match all item in list, so:

        for a in author:
            queryset = queryset.filter(authors__author_first_name=a)
                if not queryset:
                    break
Prolongate answered 30/3, 2021 at 12:5 Comment(0)
B
0
for category in categories:
    query = Photo.objects.filter(tags_name=category)

this piece of code , filters your photos which have all the tags name coming from categories.

Bemba answered 5/3, 2022 at 14:53 Comment(0)
S
-1

If we want to do it dynamically, followed the example:

tag_ids = [t1.id, t2.id]
qs = Photo.objects.all()

for tag_id in tag_ids:
    qs = qs.filter(tag__id=tag_id)    

print qs
Sampling answered 5/5, 2018 at 8:16 Comment(1)
Cannot work as as soon as the second iteration, the queryset will be emptyWarmedover
X
-1
queryset = Photo.objects.filter(tags__name="vacaciones") | Photo.objects.filter(tags__name="verano")
Xantha answered 24/6, 2020 at 4:4 Comment(1)
This will return Photos that are either tagged "vacaciones" or "verano", not Photos tagged with both tags...Hedwighedwiga

© 2022 - 2024 — McMap. All rights reserved.