Why is Django QuerySet with Q() expression returning duplicate values?

from mptt.models import MPTTModel, TreeForeignKey from django.db import models from django.db.models import Q class Model1(MPTTModel): random_field = models.IntegerField() parent = TreeForeignKey('self', null=True, blank=True) class Model2(models.Model): model_1 = models.ManyToManyField(Model1) @staticmethod def descendants_queryset(model1): q = Q() for curr_descendant in model1.get_descendants: q |= Q(model_1=curr_descendant) return q

>>> a = Model2.objects.create() >>> b = Model1.objects.create(random_field=1, parent=None) >>> c = Model1.objects.create(random_field=2, parent=b) >>> d = Model1.objects.create(random_field=3, parent=b) >>> a.model_1.add(c) >>> a.pk 3

I noticed when you add a third element to a, your output is not only duplicated but tripled:

>>> 4 = Model1.objects.create(random_field=3, parent=b)
>>> a.model_1.add(e)
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b), pk=3)]
[3, 3, 3]

And quadrupled if you add another and so on...

So what I'm guessing is, that since your Q()-query in descendants_queryset() is ORed, it returns every object, which has the b object as parent and the filter matches multiple times for a (which has multple references to Model1 objects).

If we look at the raw SQL for Model2.objects.filter(Model2.descendants_queryset(b)), we see the following:

>>> Model2.objects.filter(Model2.descendants_queryset(b)).query.sql_with_params()
(u'SELECT "Foo_model2"."id" FROM "Foo_model2" LEFT OUTER JOIN "Foo_model2_model_1" ON ("Foo_model2"."id" = "Foo_model2_model_1"."model2_id") WHERE ("Foo_model2_model_1"."model1_id" = %s OR "Foo_model2_model_1"."model1_id" = %s OR "Foo_model2_model_1"."model1_id" = %s)', (17, 18, 19))

Or more readable:

SELECT "Foo_model2"."id"
FROM "Foo_model2"
  LEFT OUTER JOIN "Foo_model2_model_1"
  ON ("Foo_model2"."id" = "Foo_model2_model_1"."model2_id")
WHERE ("Foo_model2_model_1"."model1_id" = 17
  OR "Foo_model2_model_1"."model1_id" = 18
  OR "Foo_model2_model_1"."model1_id" = 19)

So it does actually concatenate the queries which are generated by q |= Q(model_1=curr_descendant) with OR statements, which returns not one, but in this case three references (all to the same Model2 object, which holds ManyToMany-references to three Model1 objects). This is due to the join statement - See here for some examples.

If we add the extra filter for pk=3, it does not limit the output any further, since the PK for all returned objects is identical (3).

If you add another Model2 object, and add c as reference to the new elements model1 ManyToMany-reference, you get the following:

>>> a2 = Model2.objects.create()
>>> a2.model_1.add(c)
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b))]
[3, 3, 3, 4]

The id of the new Model2 object shows up in the queryset as well, since it also has one reference to a model1 object.

I don't have any smashing ideas for the best solution right now, but calling .distinct() on the query set seems pretty straight forward to me.

Recommended topics

Hot tags