Why is Django QuerySet with Q() expression returning duplicate values?
Asked Answered
R

1

5

I have the following 2 Django models:

from mptt.models import MPTTModel, TreeForeignKey
from django.db import models
from django.db.models import Q

class Model1(MPTTModel):
    random_field = models.IntegerField()
    parent = TreeForeignKey('self', null=True, blank=True)


class Model2(models.Model):
    model_1 = models.ManyToManyField(Model1)

    @staticmethod
    def descendants_queryset(model1):
        q = Q()
        for curr_descendant in model1.get_descendants:
            q |= Q(model_1=curr_descendant)
        return q

I have created instances like this:

>>> a = Model2.objects.create()
>>> b = Model1.objects.create(random_field=1, parent=None)
>>> c = Model1.objects.create(random_field=2, parent=b)
>>> d = Model1.objects.create(random_field=3, parent=b)
>>> a.model_1.add(c)
>>> a.pk
3

When I do a normal queryset filter and when I use the Q() expression it produces the same results (as expected):

>>> [i.pk for i in Model2.objects.filter(pk=3)]
[3]
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b), pk=3)]
[3]

But when I add another instance of the Model1 to the ManyToMany relationship, I see a weird duplication only when I filter using the Q() expression:

>>> a.model_1.add(d)
>>> [i.pk for i in Model2.objects.filter(pk=3)]
[3]
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b), pk=3)]
[3, 3]

I'm confused why this duplication is happening. It seems like a bug to me. I can obviously work around it by adding a .distinct() to the queryset. But that seems like it should not be necessary. Why is this happening and what is the proper solution?

Reversible answered 26/12, 2015 at 4:29 Comment(0)
G
8

I noticed when you add a third element to a, your output is not only duplicated but tripled:

>>> 4 = Model1.objects.create(random_field=3, parent=b)
>>> a.model_1.add(e)
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b), pk=3)]
[3, 3, 3]

And quadrupled if you add another and so on...

So what I'm guessing is, that since your Q()-query in descendants_queryset() is ORed, it returns every object, which has the b object as parent and the filter matches multiple times for a (which has multple references to Model1 objects).

If we look at the raw SQL for Model2.objects.filter(Model2.descendants_queryset(b)), we see the following:

>>> Model2.objects.filter(Model2.descendants_queryset(b)).query.sql_with_params()
(u'SELECT "Foo_model2"."id" FROM "Foo_model2" LEFT OUTER JOIN "Foo_model2_model_1" ON ("Foo_model2"."id" = "Foo_model2_model_1"."model2_id") WHERE ("Foo_model2_model_1"."model1_id" = %s OR "Foo_model2_model_1"."model1_id" = %s OR "Foo_model2_model_1"."model1_id" = %s)', (17, 18, 19))

Or more readable:

SELECT "Foo_model2"."id"
FROM "Foo_model2"
  LEFT OUTER JOIN "Foo_model2_model_1"
  ON ("Foo_model2"."id" = "Foo_model2_model_1"."model2_id")
WHERE ("Foo_model2_model_1"."model1_id" = 17
  OR "Foo_model2_model_1"."model1_id" = 18
  OR "Foo_model2_model_1"."model1_id" = 19)

So it does actually concatenate the queries which are generated by q |= Q(model_1=curr_descendant) with OR statements, which returns not one, but in this case three references (all to the same Model2 object, which holds ManyToMany-references to three Model1 objects). This is due to the join statement - See here for some examples.

If we add the extra filter for pk=3, it does not limit the output any further, since the PK for all returned objects is identical (3).

If you add another Model2 object, and add c as reference to the new elements model1 ManyToMany-reference, you get the following:

>>> a2 = Model2.objects.create()
>>> a2.model_1.add(c)
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b))]
[3, 3, 3, 4]

The id of the new Model2 object shows up in the queryset as well, since it also has one reference to a model1 object.

I don't have any smashing ideas for the best solution right now, but calling .distinct() on the query set seems pretty straight forward to me.

Gwendolyngweneth answered 3/1, 2016 at 18:27 Comment(2)
I am also facing that same problem, Have you found any solution?Electronic
I have a many-to-many field which is causing that problem, if I remove that field then it doesn't give any duplicatesElectronic

© 2022 - 2024 — McMap. All rights reserved.