How to set "REPEATABLE READ" for a transaction In Django?
Asked Answered
K

3

26

I have a function, that does multiple queries on the same dataset and I want to ensure all the queries would see exactly the same data.

In terms of SQL, this means REPEATABLE READ isolation level for the databases that support it. I don't mind having higher level or even a complete lockdown if the database isn't capable.

As far as I see, this isn't the case. I.e. if I run something like this code in one Python shell:

with transaction.atomic():
    for t in range(0, 60):
        print("{0}: {1}".format(t, MyModel.objects.count()))
        time.sleep(1)

As soon as I do MyModel.objects.create(...) in another, the value seen by the running loop increase immediately. Which is exactly what I want to avoid. Further tests shows the behavior matches READ COMMITTED level, which is too lax for my tastes.

I'd also want to stress the point, I want stricter isolation level only for a single function, not for the whole project.

What are my best options to achieve this?

In my particular case, the only database I care of is PostgreSQL 9.3+, but I also want some compatibility with SQLite3 in which case even completely locking the whole database is okay with me. Yet, obviously, the more general the solution is, the more preferred it is.

Kweiyang answered 20/6, 2015 at 15:31 Comment(2)
is caching this metadata is an acceptable approach?Lippold
Unfortunately, no. The queries I do calculate various statistics over raw event data and to have consistent view, I'll have to pull the whole dataset in memory, which is something I really don't want to do.Kweiyang
T
25

You're right, default transaction isolation level in postgres is READ COMMITTED. You can easily change it in settings to test whether it would fit your needs: https://docs.djangoproject.com/en/1.8/ref/databases/#isolation-level

Also I doubt you will face some performance issues because postgres operates very efficiently while working with transactions. Even in SERIALIZABLE mode. Also mysql has REPEATABLE READ default isolation level and as we see it doesn't hurt performance too.

Anyway you can set isolation mode manually whenever you need like this: http://initd.org/psycopg/docs/extensions.html#isolation-level-constants

To set custom transaction isolation level you can try smth like:

from django.db import connection

with transaction.atomic():
    cursor = connection.cursor()
    cursor.execute('SET TRANSACTION ISOLATION LEVEL REPEATABLE READ')
    # logic

Also I would suggest you to change default mode in settings first (if you can). Then if will fit your needs you can remove it and modify code in special places.

Thunderstorm answered 20/6, 2015 at 16:44 Comment(6)
Thanks for reply. I've had performance issues before (I think this was about five years ago) and changing default isolation level from SERIALIZABLE to READ COMMITTED (that was non-Django project) had reasonable impact on performance. Maybe it was just my particular case, or I had hit some bug. Either way, I'm sort of reluctant of setting overly strict isolation level globally and would really like to know if there's a way to change this just for the duration of execution of a single function or code block. Or to know this is just not feasible because something in the ORM design prevents this.Kweiyang
Of course there're some scenarios where strict isolation modes are overkill. Anyway postgres will do its best :) Setting some non-default isolation level for particular parts of your code is absolutely ok. Using the same one in the whole project is just more consistent as it ensures that you will not at least forget to do it somewhere. Updated the answer with code sample.Thunderstorm
the problem is you have to call SET TRANSACTION ISOLATION LEVEL REPEATABLE READ before any query and it's not simple to be done in DjangoCardiomegaly
Potentially you could define 2 separate connections to the same database with different isolation levels, and use the more restricted connection whenever you need it?Hickory
Here's an answer that shows how to set up two separate connections with different isolation levels, as suggested by DanH.Fugacity
The main problem with higher isolation level that you need to be careful of isn't the overhead of the isolation level (they're mostly negligible on Postgres), but rather the additional blocking (if Postgres detected a possible write conflict) and serialization exceptions that you may get in the application if Postgres decided that the concurrent transaction can no longer continue because of a conflicting write just got committed. If your application doesn't have conflicts, the overhead should be negligible.Sommers
E
1

django-pgtransaction offers an extension of the django.db.transaction.atomic context manager/decorator which allows one to dynamically set the isolation level when opening a transaction, as well as specifying a retry policy for when an operation in that transaction results in a Postgres locking exception:

@pgtransaction.atomic(isolation_level=pgtransaction.SERIALIZABLE, retry=3)
def do_queries():
    # Do queries...
Extravasate answered 21/9, 2022 at 5:40 Comment(0)
F
0

There are 2 ways to set REPEATABLE READ in Django.

<The 1st way (My recommendation)>

You can run the raw query to set isolation level after database settings in settings.py as shown below. *If running the raw query before database settings, error occurs:

# "settings.py"

from django.db import connection

# ...

DATABASES = {
    'default':{
        'ENGINE':'django.db.backends.postgresql',
        'NAME':'postgres',
        'USER':'postgres',
        'PASSWORD':'admin',
        'HOST':'localhost',
        'PORT':'5432',
    },
}

# ↓ ↓ ↓ Set isolation level ↓ ↓ ↓

cursor = connection.cursor()
query = """
        ALTER DATABASE postgres 
        SET DEFAULT_TRANSACTION_ISOLATION 
        TO 'REPEATABLE READ';
        """
cursor.execute(query)

# ↓ ↓ ↓ Check isolation level ↓ ↓ ↓

cursor.execute('SHOW default_transaction_isolation;')
print(cursor.fetchone()) # ('repeatable read',)

*settings.py is run every time Django Server is run with the command below or every time Django Server is reloaded by writing code so transaction is set every time Django Server is run with the command below or every time Django Server is reloaded by writing code:

python manage.py runserver 0.0.0.0:8000

<The 2nd way>

You can directly set REPEATABLE READ with psql as shown below:

postgres=# ALTER DATABASE postgres SET DEFAULT_TRANSACTION_ISOLATION TO 'REPEATABLE READ';

Actually, what the documentation explains as shown below doesn't work for me with Django 3.2.16 on Windows 11. That's why I show the 2 ways above:

# "settings.py"

import psycopg2.extensions

DATABASES = {
    'default':{
        'ENGINE':'django.db.backends.postgresql',
        'NAME':'postgres',
        'USER':'postgres',
        'PASSWORD':'admin',
        'HOST':'localhost',
        'PORT':'5432',
    },
    'OPTIONS': { # ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ Doesn't work ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
        'isolation_level': psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE,
    },
}
Frosty answered 5/12, 2022 at 3:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.