What is correct way to use psycopg2 cursors in threads?
Asked Answered
L

2

6

I feel the answer to my question is in these two SO questions, but I find the answers very poorly worded (or above my pay grade)

  1. multi thread python psycopg2
  2. Are transactions in PostgreSQL via psycopg2 per-cursor or per-connection?

Question: What is the correct way to use psycopg2 to ensure it is thread safe

Option 1: Each thread has its own cursor

import threading
import psycopg2

conn = psycopg2.connect (
    host=127.0.0.1,
    user='john',
    password='1234',
    dbname='foo',
    port=1234)

class Foo (threading.Thread):
    def __init__ (self):
        threading.Thread.__init__(self)

    def run (self):
        global conn

        cur = conn.cursor()
        sql_query="SELECT * from foo;"
        print(cur.execute (sql_query))
        conn.commit()

num_threads = 100
threads = []

for i in seq (num_threads):
    threads.append (Foo())

for i in seq (num_threads):
    threads[i].start()

for i in seq (num_threads):
    threads[i].join()

Option 2: Each thread has it's own connection

import threading
import psycopg2

db_conn = psycopg2.connect (
    host=127.0.0.1,
    user='john',
    password='1234',
    dbname='foo',
    port=1234)

class Foo (threading.Thread):
    def __init__ (self):
        threading.Thread.__init__(self)
        self.conn = psycopg2.connect (
            host=127.0.0.1,
            user='john',
            password='1234',
            dbname='foo',
            port=1234)

    def run (self):
        cur = self.conn.cursor()
        sql_query="SELECT * from foo;"
        print(cur.execute (sql_query))
        conn.commit()

num_threads = 100
threads = []

for i in seq (num_threads):
    threads.append (Foo())

for i in seq (num_threads):
    threads[i].start()

for i in seq (num_threads):
    threads[i].join()
Lockout answered 30/10, 2020 at 3:4 Comment(0)
B
6

Each thread should have its own database connection.

A PostgreSQL connection can handle only one statement at a given time (unless you are using a server side cursor, but even then the connection can handle only one FETCH at the same time).

So if several threads were to share a database connection, they'd have to coordinate carefully to make sure that only one thread uses the connection at the same time. For example, you cannot send a new query while another thread is still waiting for a query result.

Being answered 30/10, 2020 at 3:27 Comment(2)
Thanks. If I may be greedy, could you quickly comment on whether there is a performance/bottleneck issues with too many DB connections?Lockout
It is definitely a bad idea to have too many database connections. That's why you are using a connection pool that contains a limited number of persistent database connections. For every transaction a thread wants to perform, you grab one of the pooled connections and return it when you are done. I am sure there is built-in connection pooling in psycopg2.Being
T
4

Seams connections can be shared, fom the Psycopg 2.9.3 documentation: https://www.psycopg.org/docs/connection.html

Connections are thread safe and can be shared among many threads. See Thread and process safety for details.

Transcontinental answered 16/3, 2022 at 10:40 Comment(2)
This worked for me. I was using Process, which failed on newer Linux computers, but threading.Thread did the trick.Liverwort
You need to be very careful when sharing a connection across threads because commits and rollbacks are performed for the entire connection. If you have a multistep transaction and another thread performs a commit in the middle, you may not like the result.Chablis

© 2022 - 2024 — McMap. All rights reserved.