PostgreSQL - how to run VACUUM from code outside transaction block?
Asked Answered
B

9

49

I am using Python with psycopg2 and I'm trying to run a full VACUUM after a daily operation which inserts several thousand rows. The problem is that when I try to run the VACUUM command within my code I get the following error:

psycopg2.InternalError: VACUUM cannot run inside a transaction block

How do I run this from the code outside a transaction block?

If it makes a difference, I have a simple DB abstraction class, a subset of which is displayed below for context (not runnable, exception-handling and docstrings omitted and line spanning adjustments made):

class db(object):
    def __init__(dbname, host, port, user, password):
        self.conn = psycopg2.connect("dbname=%s host=%s port=%s \
                                      user=%s password=%s" \
                                      % (dbname, host, port, user, password))

        self.cursor = self.conn.cursor()

    def _doQuery(self, query):
        self.cursor.execute(query)
        self.conn.commit()

    def vacuum(self):
        query = "VACUUM FULL"
        self._doQuery(query)
Bibliofilm answered 19/6, 2009 at 11:26 Comment(3)
@nosklo, Good suggestion, but according to the Postgres docs that is the same as COMMIT.Bibliofilm
Are you using SQLAlchemy by any chance? I experienced a similiar issue because setting autocommit=True in SqlAlchemy does not actually turn off transactions. Using set_isolation_level is a work around that accesses the internal methods of the psycopg2 connection.Unreality
@MichaelAquilina I believe at the time (this was 6 years ago now) this was for part of a project that wasn't using an ORM.Bibliofilm
B
83

After more searching I have discovered the isolation_level property of the psycopg2 connection object. It turns out that changing this to 0 will move you out of a transaction block. Changing the vacuum method of the above class to the following solves it. Note that I also set the isolation level back to what it previously was just in case (seems to be 1 by default).

def vacuum(self):
    old_isolation_level = self.conn.isolation_level
    self.conn.set_isolation_level(0)
    query = "VACUUM FULL"
    self._doQuery(query)
    self.conn.set_isolation_level(old_isolation_level)

This article (near the end on that page) provides a brief explanation of isolation levels in this context.

Bibliofilm answered 19/6, 2009 at 12:18 Comment(3)
Or, avoiding magic numbers: self.conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)Rahmann
I had been roaming on a number of websites trying to find the solution. This worked like magic. Thank you @wayneSecrecy
@InaamIlahi I'm really glad this could still help someone after all these years! :)Bibliofilm
S
6

For anyone else that has tried all the suggestions around this problem without success, you may be suffering the same fate as me: I had 2 (or more) SQL statements in one execute() call. It turns out that Postgres itself resets any autocommit/isolation after the first statement (separated by a ;). I finally came across the solution here: https://github.com/psycopg/psycopg2/issues/1201

So don't do something like this:

cursor.execute("SELECT 1; VACUUM FULL")

Instead do:

cursor.execute("SELECT 1")
cursor.execute("VACUUM FULL")
Sinkage answered 17/3, 2021 at 4:45 Comment(0)
H
5

Additionally, you can also get the messages given by the Vacuum or Analyse using:

>> print conn.notices #conn is the connection object

this command print a list with the log message of queries like Vacuum and Analyse:

INFO:  "usuario": processados 1 de 1 páginas, contendo 7 registros vigentes e 0 registros não vigentes; 7 registros amostrados, 7 registros totais estimados   
INFO:  analisando "public.usuario"

This can be useful to the DBAs ^^

Housemaster answered 8/1, 2011 at 21:15 Comment(1)
You need to run cursor.execute('VACUUM FULL VERBOSE') to actually get something in that property.Courage
N
4

While vacuum full is questionable in current versions of postgresql, forcing a 'vacuum analyze' or 'reindex' after certain massive actions can improve performance, or clean up disk usage. This is postgresql specific, and needs to be cleaned up to do the right thing for other databases.

from django.db import connection
# Much of the proxy is not defined until this is done
force_proxy = connection.cursor()
realconn=connection.connection
old_isolation_level = realconn.isolation_level
realconn.set_isolation_level(0)
cursor = realconn.cursor()
cursor.execute('VACUUM ANALYZE')
realconn.set_isolation_level(old_isolation_level)

Unfortunately the connection proxy provided by django doesn't provide access to set_isolation_level.

Neogaea answered 19/12, 2012 at 15:9 Comment(1)
What does force_proxy = connection.cursor()? You don't use that variable.Kory
H
2

Note if you're using Django with South to perform a migration you can use the following code to execute a VACUUM ANALYZE.

def forwards(self, orm):

    db.commit_transaction()
    db.execute("VACUUM ANALYZE <table>")

    #Optionally start another transaction to do some more work...
    db.start_transaction()
Heelandtoe answered 15/5, 2014 at 17:18 Comment(0)
G
1

I don't know psycopg2 and PostgreSQL, but only apsw and SQLite, so I think I can not give a "psycopg2" help.

But it seams to me, that PostgreSQL might work similar as SQLite does, it has two modes of operation:

  • Outside a transaction block: This is semantically equivalent to have a transaction block around every single SQL operation
  • Inside a transaction block, that is marked by "BEGIN TRANSACTION" and ended by "END TRANSACTION"

When this is the case, the problem could be inside the access layer psycopg2. When it does normally operate in a way that transactions are implicitely inserted until a commit is made, there could be no "standard way" to make a vacuum.

Of course it could be possible, that "psycopg2" has its special "vacuum" method, or a special operation mode, where no implicit transactions are started.

When no such possibilities exists, there stays one single option (without changing the access layer ;-) ):

Most databases have a shell programm to access the database. The program could run this shell program with a pipe (entering the vacuum-command into the shell), thus using the shell programm to make the vacuum. Since vacuum is a slow operation as such, the start of an external programm will be neglectible. Of course, the actual program should commit all uncommited work before, else there could be a dead-lock situation - the vacuum must wait until end of your last transaction.

Gest answered 19/6, 2009 at 11:59 Comment(1)
Thanks for your detailed answer. It turns out the solution was to do with "isolation levels", see my answer below.Bibliofilm
M
1

While it's not the OP's exact situation, I ran into this as well. It turns out it was related to an implicit transaction being opened when the connection is used as a context. For instance, if the OP's code had been something like this, even with the correct isolation levels it would've hit the exact same error for different reasons:

class db(object):
    def __init__(dbname, host, port, user, password):
        self.conn = psycopg2.connect("dbname=%s host=%s port=%s \
                                      user=%s password=%s" \
                                      % (dbname, host, port, user, password))
        self.conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)

    def _doQuery(self, query):
        # THE FOLLOWING LINE OPENS AN IMPLICIT TRANSACTION
        # No matter the autocommit settings, this WILL create a transaction
        with self.conn:
            with conn.cursor() as cursor:
                cursor.execute(query)
                self.conn.commit()

    def vacuum(self):
        query = "VACUUM FULL"
        self._doQuery(query)

The solution is to just not use the connection as a context:

    def _doQuery(self, query):
        # Note no "with conn:" context anymore
        with conn.cursor() as cursor:
            cursor.execute(query)
            self.conn.commit()
Magniloquent answered 10/1, 2023 at 20:11 Comment(0)
S
0
    from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
    def vacuum_analyze_table(engine, tablename: str):
        # A raw connection is needed, as a standard connection does not expose the `set_isolation_level` method
        connection = engine.raw_connection()
        
        # Because a raw connection/cursor is being used, DO NOT WRAP your query with `text(...)` as
        # it will throw the error: boolean value of this clause is not defined
        vacuum_analyze_query = f'VACUUM ANALYZE {tablename}'
        
        # There is no get_isolation_level method on a raw connection, so to get the original value, grab it directly
        old_isolation = connection.isolation_level
        
        # This rectifies the error: psycopg2.errors.ActiveSqlTransaction: VACUUM cannot run inside a transaction block
        connection.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
        cursor = connection.cursor()
        cursor.execute(vacuum_analyze_query)
        
        # Restore isolation level
        connection.set_isolation_level(old_isolation)
Smetana answered 4/3 at 21:51 Comment(0)
Y
-3

Don't do it - you don't need VACUUM FULL. Actually if you run somewhat recent version of Postgres (let's say > 8.1) you don't even need to run plain VACUUM manually.

Yiddish answered 19/6, 2009 at 12:22 Comment(6)
Depending on your usage patterns, there are still times it makes sense to vacuum manually imho.Vitebsk
There are, but there aren't as many anymore. And it should definitely not be VACUUM FULL.Katalin
I'm getting into PostGres and with some large tables. All the books (talking from a 8.* or 9.* perspective) talk about running VACUUM ANALYZE manually after a lot of updates, or automatically with a daemon.Citrin
"after a daily operation which inserts several thousand rows", utilities like this should definitely VACUUM when they're finished.Mallemuck
@Brad Koch: there's a significant difference between VACUUM and VACUUM FULL.Yiddish
Huge difference, but you do want a VACUUM ANALYZE after a large batch of updates, whether it's triggered by you or autovacuum. "you don't even need to run plain VACUUM manually" on its own is a bit misleading.Mallemuck

© 2022 - 2024 — McMap. All rights reserved.