How to know which process is responsible for a "OperationalError: database is locked"?
Asked Answered
A

2

12

I sometimes randomly encounter:

OperationalError: database is locked

in a process that updates a SQLite database, but I find it difficult to reproduce the error:

  • no other process is inserting / deleting rows at the same time
  • just one process might do some read-only queries (SELECT, etc.) here and there, but no committing

I've already read OperationalError: database is locked

Question: Is there a way, when this error happens, to log which other process ID is responsible for the lock?

More generally, how to debug a OperationalError: database is locked?

Aphesis answered 12/11, 2018 at 21:42 Comment(5)
Haven't used sqlite in decades, but could it be I/O to disk locking the database? For instance, a large INSERT that triggers the disk to lock the database for a bit if the OS is doing some other heavy/maintenance load on the disk? I doubt that there's sqlite tools for this. It's inherently light weight for a reason. If possible, I would have considered moving over to an actual database engine (PostgreSQL or something) if this becomes a problem.Tussis
This particular exception is raised when the SQLite API signals SQLITE_BUSY, and there is no context with that signal. You can't determine what PID might have been the reason that caused SQLite to return that response.Hotze
Put differently, read up on the what is the PID of the process that causes SQLITE_BUSY questions already here.Hotze
The process that updates the SQLite database is probably at fault here. Put timings around the transactions; record the time when a transaction starts, and log the duration when you commit. Something is taking 5 second or longer here (the default timeout value for the sqlite3.connect() call).Hotze
If your OS is supported by lsof you can use it to find every process that has the file opened and see which one has a lock - probably the one that's in the middle of a select. If you want database readers to not block database writers and vs versa, look into WAL journal mode.Methylal
A
19

Solution: Always close the cursor for (even read-only) queries!

First, here is a way to reproduce the problem:

  1. First run this code, once:

    import sqlite3
    conn = sqlite3.connect('anothertest.db')
    conn.execute("CREATE TABLE IF NOT EXISTS mytable (id int, description text)")
    for i in range(100):
        conn.execute("INSERT INTO mytable VALUES(%i, 'hello')" % i)
    conn.commit()
    

    to initialize the test.

  2. Then begin a read-only query:

    import sqlite3, time
    conn = sqlite3.connect('anothertest.db')
    c = conn.cursor()
    c.execute('SELECT * FROM mytable')
    item = c.fetchone()
    print(item)
    print('Sleeping 60 seconds but the cursor is not closed...')
    time.sleep(60)
    

    and keep this script running while executing the next step:

  3. Then try to delete some content and commit:

    import sqlite3
    conn = sqlite3.connect('anothertest.db')
    conn.execute("DELETE FROM mytable WHERE id > 90")
    conn.commit()
    

    It will trigger this error indeed:

    sqlite3.OperationalError: database is locked

Why? Because it's not possible to delete data that is currently accessed by a read-query: if the cursor it's still open, it means the data could still be fetched with fetchone or fetchall.

Here is how to solve the error: in step #2, just add:

item = c.fetchone()
print(item)
c.close()
time.sleep(60)

Then while this is still running, start script #3, you will see there is no more error.

Aphesis answered 25/11, 2018 at 17:36 Comment(2)
Could this scenario happen if the DB is in wal mode? What if the step 3 is an insert to the table instead of a delete?Angelangela
Hey, how to close the cursor manually in django?Howerton
H
5

Is there a way, when this error happens, to log which other process ID is responsible for the lock?

No, that information is not recorded when the exception occurs. The OperationalError: database is locked exception is usually raised after a timeout (5 minutes is the default) when trying to obtain a mutex and and a file lock in the SQLite internals, at which point SQLite returns SQLITE_BUSY, but SQLITE_BUSY can also be reported at other points. SQLite error codes do not carry any further context such as the PID of another process that held a lock, and it is conceivable that the lock was passed between two other processes before the current process gave up trying to get hold of it!

At best you can enumerate what processes are currently accessing the file by using lsof <filename of database>, but that won't get you any closer to figuring out which one of these is actually taking too long to commit.

I would, instead, instrument your code with explicit transactions and detailed logging as to when you start and commit transactions. Then, when you do run into OperationalError exceptions, you can check the logs for what was happening where within that time window.

A Python context manager that could be used for this is:

import logging
import sys
import time
import threading
from contextlib import contextmanager
from uuid import uuid4

logger = logging.getLogger(__name__)


@contextmanager
def logged_transaction(con, stack_info=False, level=logging.DEBUG):
    """Manage a transaction and log start and end times.

    Logged messages include a UUID transaction ID for ease of analysis.

    If trace is set to True, also log all statements executed.
    If stack_info is set to True, a stack trace is included to record
    where the transaction was started (the last two lines will point to this
    context manager).

    """
    transaction_id = uuid4()
    thread_id = threading.get_ident()

    def _trace_callback(statement):
        logger.log(level, '(txid %s) executing %s', transaction_id, statement)
    if trace:
        con.set_trace_callback(_trace_callback)

    logger.log(level, '(txid %s) starting transaction', transaction_id, stack_info=stack_info)

    start = time.time()
    try:
        with con:
            yield con
    finally:
        # record exception information, if an exception is active
        exc_info = sys.exc_info()
        if exc_info[0] is None:
            exc_info = None
        if trace:
            con.set_trace_callback(None)
        logger.log(level, '(txid %s) transaction closed after %.6f seconds', transaction_id, time.time() - start, exc_info=exc_info)

The above will create start and end entries, include exception information if there is any, optionally trace all statements that are being executed on the connection and can include a stack trace that will tell you the location where the context manager was used. Do make sure to include the date and time in when formatting log messages so you can track when transactions started.

I'd use it around any code using the connection, so you can time selects as well:

with logged_transaction(connection):
    cursor = connection.cursor()
    # ...

It may be that just using this context manager makes your issue disappear, at which point you'd have to analyse why the code without this context manager leaves open a transaction without committing.

You may also want to use a lower timeout value in sqlite3.connect() calls to speed up the process; you may not have to wait a full 5 minutes to detect the situation.

Note on threading: when enabling tracing it is assumed that you use separate connections for separate threads. If this is not the case, then you'll need to permanently register a trace callback that then sorts out what transaction id to use for the current thread.

Hotze answered 13/11, 2018 at 13:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.