Django MPTT Postgres update query runs slowly

A

2

8

I'm using mptt in a model to manage a tagging system (each tag has an optional TreeForeignKey to a 'parent' tag)

Whenever I need to save a tag model, the following query runs exceptionally slow (upwards of 45 seconds)

 UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + %s) WHERE "taxonomy_taxonomy"."tree_id" > %s

I send the content of articles through an automated tagging system, which can generate upwards of 20 tags. Obviously, that won't fly :)

I added the db_index=False hoping to change write times (reads don't seem to be a problem) but the problem persists.

Here's the model in question:

class Taxonomy(MPTTModel):

parent = TreeForeignKey('self',blank=True,null=True,related_name='children',verbose_name='Parent', db_index=False)
parent_name = models.CharField(max_length=64, blank=True, null=True, editable=False)
name = models.CharField(verbose_name='Title', max_length=100, db_index=True)
slug = models.SlugField(verbose_name='Slug', blank=True)
primary = models.BooleanField(
    verbose_name='Is Primary',
    default=False,
    db_index=True,
)
type = models.CharField(max_length=30, db_index=True)
created_date = models.DateTimeField(auto_now_add=True, null=True)
updated_date = models.DateTimeField(auto_now=True, null=True)
publication_date = models.DateTimeField(null=True, blank=True)
scheduled_date = models.DateTimeField(null=True, blank=True)
workflowstate = models.CharField(max_length=30, default='draft')
created_by = models.ForeignKey(User, null=True)

paid_content = models.BooleanField(verbose_name='Is Behind the Paywall', default=False, blank=True)
publish_now = True
show_preview = False
temporary = models.BooleanField(default=False)

def save(self, *args, **kwargs):

    if self.slug is None:
        self.slug = self.name

    if not self.slug:
        self.slug = slugify(self.name)[:50]

    if self.parent:
        self.parent_name = self.parent.name

    self.slug = slugify(self.slug)
    self.workflowstate = "published"

    super(Taxonomy, self).save(*args, **kwargs)

    store_to_backend_mongo(self)
    publish_to_frontend(self)

And the query plan (as reported by New Relic):

1) Update on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)

2) -> Seq Scan on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)

3) Filter: ?

Finally, the traceback from such a query:

    Traceback (most recent call last):
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 711, in __iter__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 1087, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/base.py", line 132, in get_response
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/framework_django.py", line 499, in wrapper
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/contrib/auth/decorators.py", line 22, in _wrapped_view
File "./editorial/views.py", line 242, in calculate_queryly
File "./editorial/views.py", line 292, in queryly_function
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/manager.py", line 127, in manager_method
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 348, in create
File "./taxonomy/models.py", line 179, in save
File "./taxonomy/models.py", line 58, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 946, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 702, in insert_at
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 467, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 491, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 726, in _create_tree_space
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 364, in _mptt_update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 563, in update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/database_dbapi2.py", line 22, in execute

Any idea how I can get these model saves to be speedier?

EDIT for further info: This is in Postgres, with the psycopg2 engine 'ENGINE': 'django.db.backends.postgresql_psycopg2',

SECOND EDIT: As requested, I ran the query with EXPLAIN ANALYZE. The result is as follows:

nj=# EXPLAIN ANALYZE UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + 1) WHERE "taxonomy_taxonomy"."tree_id" > 1;

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Update on taxonomy_taxonomy  (cost=0.00..9588.75 rows=24582 width=132) (actual time=258718.550..258718.550 rows=0 loops=1)
   ->  Seq Scan on taxonomy_taxonomy  (cost=0.00..9588.75 rows=24582 width=132) (actual time=59.956..8271.209 rows=24582 loops=1)
         Filter: (tree_id > 1)
         Rows Removed by Filter: 2
 Planning time: 28.763 ms
 Execution time: 258718.661 ms
(6 rows)

Asperse answered 28/12, 2015 at 16:14 Comment(5)

Whad DB are you using? if MySQL, what engine, InnoDB or MyISAM? – Misconstruction 28/12, 2015 at 17:28

Ah, should've included that. I'll edit it now. I'm using postgres. From settings.py: 'ENGINE': 'django.db.backends.postgresql_psycopg2', – Asperse 28/12, 2015 at 17:50

you should run the same query on psql with EXPLAIN ANALYZE and add the output here (the one from new relic is incomplete) – Overheat 28/12, 2015 at 19:7

Thanks Tommaso, I've now done that as well. Am I right that this is flying through 25k rows of a table, iterating all of them?I understand why it would do this (the tree_id sets relevant elements close to each other) but does this mean the tree_id field isn't suited for my purposes in a table with 25k rows? – Asperse 29/12, 2015 at 21:20

Are you sure that db_index=False improved things? DB still needs to find which rows to update with the WHERE clause – Vitascope 8/1, 2016 at 12:32

S

9

django mptt uses Nested Set Model

So if your save method causes insert operation django-mptt have to recalculate a lot of data. It's just don't work on big tables.

You have to refuse using django-mptt and invent your own database schema.

Stercoricolous answered 8/1, 2016 at 18:32 Comment(2)

is this still true? is mptt still using nested set model? – Cryptonymous 29/6, 2020 at 8:26

github.com/django-mptt/django-mptt MPTT is a technique for storing hierarchical data in a database. The aim is to make retrieval operations very efficient. The trade-off for this efficiency is that performing inserts and moving items around the tree is more involved, as there's some extra work required to keep the tree structure in a good state at all times. – Stercoricolous 29/6, 2020 at 9:30

A

1

There appears to be a lot of updates across the table when you modify the tree. On postgres, this will result in a lot of deleted rows that are actually not really deleted unless you do a vacuum full. we experienced a blown up table whose size shrank to 0.3% after a vacuum. Accordingly the performance increased a lot.

Auscultation answered 1/4, 2019 at 7:52 Comment(0)

Recommended topics

Hot tags