Django MPTT Postgres update query runs slowly
Asked Answered
A

2

8

I'm using mptt in a model to manage a tagging system (each tag has an optional TreeForeignKey to a 'parent' tag)

Whenever I need to save a tag model, the following query runs exceptionally slow (upwards of 45 seconds)

 UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + %s) WHERE "taxonomy_taxonomy"."tree_id" > %s 

I send the content of articles through an automated tagging system, which can generate upwards of 20 tags. Obviously, that won't fly :)

I added the db_index=False hoping to change write times (reads don't seem to be a problem) but the problem persists.

Here's the model in question:

class Taxonomy(MPTTModel):

parent = TreeForeignKey('self',blank=True,null=True,related_name='children',verbose_name='Parent', db_index=False)
parent_name = models.CharField(max_length=64, blank=True, null=True, editable=False)
name = models.CharField(verbose_name='Title', max_length=100, db_index=True)
slug = models.SlugField(verbose_name='Slug', blank=True)
primary = models.BooleanField(
    verbose_name='Is Primary',
    default=False,
    db_index=True,
)
type = models.CharField(max_length=30, db_index=True)
created_date = models.DateTimeField(auto_now_add=True, null=True)
updated_date = models.DateTimeField(auto_now=True, null=True)
publication_date = models.DateTimeField(null=True, blank=True)
scheduled_date = models.DateTimeField(null=True, blank=True)
workflowstate = models.CharField(max_length=30, default='draft')
created_by = models.ForeignKey(User, null=True)

paid_content = models.BooleanField(verbose_name='Is Behind the Paywall', default=False, blank=True)
publish_now = True
show_preview = False
temporary = models.BooleanField(default=False)

def save(self, *args, **kwargs):

    if self.slug is None:
        self.slug = self.name

    if not self.slug:
        self.slug = slugify(self.name)[:50]

    if self.parent:
        self.parent_name = self.parent.name

    self.slug = slugify(self.slug)
    self.workflowstate = "published"

    super(Taxonomy, self).save(*args, **kwargs)

    store_to_backend_mongo(self)
    publish_to_frontend(self)

And the query plan (as reported by New Relic):

1) Update on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)

2) -> Seq Scan on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)

3) Filter: ?

Finally, the traceback from such a query:

    Traceback (most recent call last):
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 711, in __iter__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 1087, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/base.py", line 132, in get_response
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/framework_django.py", line 499, in wrapper
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/contrib/auth/decorators.py", line 22, in _wrapped_view
File "./editorial/views.py", line 242, in calculate_queryly
File "./editorial/views.py", line 292, in queryly_function
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/manager.py", line 127, in manager_method
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 348, in create
File "./taxonomy/models.py", line 179, in save
File "./taxonomy/models.py", line 58, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 946, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 702, in insert_at
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 467, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 491, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 726, in _create_tree_space
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 364, in _mptt_update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 563, in update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/database_dbapi2.py", line 22, in execute

Any idea how I can get these model saves to be speedier?

EDIT for further info: This is in Postgres, with the psycopg2 engine 'ENGINE': 'django.db.backends.postgresql_psycopg2',

SECOND EDIT: As requested, I ran the query with EXPLAIN ANALYZE. The result is as follows:

nj=# EXPLAIN ANALYZE UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + 1) WHERE "taxonomy_taxonomy"."tree_id" > 1;

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Update on taxonomy_taxonomy  (cost=0.00..9588.75 rows=24582 width=132) (actual time=258718.550..258718.550 rows=0 loops=1)
   ->  Seq Scan on taxonomy_taxonomy  (cost=0.00..9588.75 rows=24582 width=132) (actual time=59.956..8271.209 rows=24582 loops=1)
         Filter: (tree_id > 1)
         Rows Removed by Filter: 2
 Planning time: 28.763 ms
 Execution time: 258718.661 ms
(6 rows)
Asperse answered 28/12, 2015 at 16:14 Comment(5)
Whad DB are you using? if MySQL, what engine, InnoDB or MyISAM?Misconstruction
Ah, should've included that. I'll edit it now. I'm using postgres. From settings.py: 'ENGINE': 'django.db.backends.postgresql_psycopg2',Asperse
you should run the same query on psql with EXPLAIN ANALYZE and add the output here (the one from new relic is incomplete)Overheat
Thanks Tommaso, I've now done that as well. Am I right that this is flying through 25k rows of a table, iterating all of them?I understand why it would do this (the tree_id sets relevant elements close to each other) but does this mean the tree_id field isn't suited for my purposes in a table with 25k rows?Asperse
Are you sure that db_index=False improved things? DB still needs to find which rows to update with the WHERE clauseVitascope
S
9

django mptt uses Nested Set Model

So if your save method causes insert operation django-mptt have to recalculate a lot of data. It's just don't work on big tables.

You have to refuse using django-mptt and invent your own database schema.

Stercoricolous answered 8/1, 2016 at 18:32 Comment(2)
is this still true? is mptt still using nested set model?Cryptonymous
github.com/django-mptt/django-mptt MPTT is a technique for storing hierarchical data in a database. The aim is to make retrieval operations very efficient. The trade-off for this efficiency is that performing inserts and moving items around the tree is more involved, as there's some extra work required to keep the tree structure in a good state at all times.Stercoricolous
A
1

There appears to be a lot of updates across the table when you modify the tree. On postgres, this will result in a lot of deleted rows that are actually not really deleted unless you do a vacuum full. we experienced a blown up table whose size shrank to 0.3% after a vacuum. Accordingly the performance increased a lot.

Auscultation answered 1/4, 2019 at 7:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.