How to make Django QuerySet bulk delete() more efficient
Asked Answered
F

3

50

Setup:
Django 1.1.2, MySQL 5.1

Problem:

Blob.objects.filter(foo = foo) \
            .filter(status = Blob.PLEASE_DELETE) \
            .delete()

This snippet results in the ORM first generating a SELECT * from xxx_blob where ... query, then doing a DELETE from xxx_blob where id in (BLAH); where BLAH is a ridiculously long list of id's. Since I'm deleting a large amount of blobs, this makes both me and the DB very unhappy.

Is there a reason for this? I don't see why the ORM can't convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?

Fumatorium answered 1/2, 2011 at 20:58 Comment(0)
F
18

Not without writing your own custom SQL or managers or something; they are apparently working on it though.

http://code.djangoproject.com/ticket/9519

Fibrilliform answered 1/2, 2011 at 21:50 Comment(0)
D
55

For those who are still looking for an efficient way to bulk delete in django, here's a possible solution:

The reason delete() may be so slow is twofold: 1) Django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) Django has to handle pre and post-save signals for your models.

If you know your models don't have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete as follows:

queryset._raw_delete(queryset.db)

More details in here. Please note that Django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.

Deina answered 29/4, 2016 at 10:9 Comment(6)
Django can made this as a public function from queryset as bulk_delete, isn't it? Why is it not done? Any high level reasons? And those who use bulk_create is well aware that signals won't work. May be just for cascade delete?Reggie
This works for me. Main caveat as mentioned is dependent models ("cascade delete") - as long as it's a top-most kind of table with no dependencies, this works, and works really fast.Kho
@Reggie the reasons are discussed hereAve
I had to use this because an exception raised in a model's __init__. It seems unreasonably difficult to do this, but suggestion works.Kanarese
This is not true as of Django 3.2 / 4.x. It seems the delete() does proper bulk deletion automatically.Skiff
bulk_delete() in Django 4.2.x still shows the behavior described by OP, although you generally need to delete > 100 objects to see multiple DELETE queries issued.Spireme
F
18

Not without writing your own custom SQL or managers or something; they are apparently working on it though.

http://code.djangoproject.com/ticket/9519

Fibrilliform answered 1/2, 2011 at 21:50 Comment(0)
O
14

Bulk delete is already part of django

Keep in mind that this will, whenever possible, be executed purely in SQL

Outbalance answered 10/10, 2012 at 7:33 Comment(8)
The Queryset.delete() behaves as how the OP described.Knives
It is still inefficient and results in OperationalError: (2006, 'MySQL server has gone away')Exalt
Bulk - in this context - means that one light-weight SQL query is executed on the server based on WHERE condition. Django ORM's delete() does not act like this.Rodger
the link open a 404 page on djangoproject, was it removed?Bemean
The link pointed to a Django 1.4 doc page, which are long gone. Here is the corresponding page for the current version: docs.djangoproject.com/en/1.11/topics/db/queries/…Stratagem
@KrisKumler not any more it doesn't. If there are no cascades and no signal listeners, then it does a single DELETE statement.Ave
@MikkoOhtamaa it does now, if it is safe for it to do soAve
This is not true as of Django 3.2 / 4.x. It seems the delete() does proper bulk deletion.Skiff

© 2022 - 2024 — McMap. All rights reserved.