Accessing a MySQL connection pool from Python multiprocessing
Asked Answered
R

4

15

I'm trying to set up a MySQL connection pool and have my worker processes access the already established pool instead of setting up a new connection each time.

I'm confused if I should pass the database cursor to each process, or if there's some other way to do this? Shouldn't MySql.connector do the pooling automatically? When I check my log files, many, many connections are opened and closed ... one for each process.

My code looks something like this:

PATH = "/tmp"

class DB(object):
  def __init__(self):
    connected = False
    while not connected:
      try:
        cnxpool = mysql.connector.pooling.MySQLConnectionPool(pool_name = "pool1",
                                                          **config.dbconfig)
        self.__cnx = cnxpool.get_connection()
      except mysql.connector.errors.PoolError:
        print("Sleeping.. (Pool Error)")
        sleep(5)
      except mysql.connector.errors.DatabaseError:
        print("Sleeping.. (Database Error)")
        sleep(5)

    self.__cur = self.__cnx.cursor(cursor_class=MySQLCursorDict)

  def execute(self, query):
    return self.__cur.execute(query)

def isValidFile(self, name):
  return True

def readfile(self, fname):
  d = DB()
  d.execute("""INSERT INTO users (first_name) VALUES ('michael')""")

def main():
  queue = multiprocessing.Queue()
  pool = multiprocessing.Pool(None, init, [queue])
  for dirpath, dirnames, filenames in os.walk(PATH):

    full_path_fnames = map(lambda fn: os.path.join(dirpath, fn),
                           filenames)
    full_path_fnames = filter(is_valid_file, full_path_fnames)
    pool.map(readFile, full_path_fnames)

if __name__ == '__main__':
  sys.exit(main())
Rhodia answered 23/6, 2014 at 19:43 Comment(1)
Maybe eric.lubow.org/2009/python/… is useful for you.Horseleech
E
13

First, you're creating a different connection pool for each instance of your DB class. The pools having the same name doesn't make them the same pool

From the documentation:

It is not an error for multiple pools to have the same name. An application that must distinguish pools by their pool_name property should create each pool with a distinct name.

Besides that, sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly), so each process using its own connections is actually what you should aim for.

You could just initialize the pool in your init initializer as a global variable and use that instead.
Very simple example:

from multiprocessing import Pool
from mysql.connector.pooling import MySQLConnectionPool
from mysql.connector import connect
import os

pool = None

def init():
    global pool
    print("PID %d: initializing pool..." % os.getpid())
    pool = MySQLConnectionPool(...)

def do_work(q):
    con = pool.get_connection()
    print("PID %d: using connection %s" % (os.getpid(), con))
    c = con.cursor()
    c.execute(q)
    res = c.fetchall()
    con.close()
    return res

def main():
    p = Pool(initializer=init)
    for res in p.map(do_work, ['select * from test']*8):
        print(res)
    p.close()
    p.join()

if __name__ == '__main__':
    main()

Or just use a simple connection instead of a connection pool, as only one connection will be active in each process at a time anyway.
The number of concurrently used connections is implicitly limited by the size of the multiprocessing.Pool.

Erastianism answered 23/6, 2014 at 21:45 Comment(9)
Thats not a good way of doing it. Init will be called from each worker seperately basically generating a new mysql pool for each worker each time.Kaenel
It would create one connnection pool for each worker proccess when the worker process is started. One connection per worker may be more suitable then a pool per worker depending on the use case, as I've explaind at the bottom of my answer. But the question is explicitly about using a connection pool with multiprocessing, that's why I kept the pool in my example.Erastianism
The correct approach would be to use one Pool that each process can access which is what OP asked. Making a seperate pool for each process is redundant and opens up way too many connections which is exactly the problem OP is mentioning.Kaenel
The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue - but that would involve substantial overhead for pickling and unpickling the data. Like I'd said, I'd just use a couple of workers with one connection each, which would be the same in nubmer of connections as using a pool with the same number of connections. But if you think a solution with one pool would be better, you should just submit your own answer.Erastianism
Unfortunately, I am unsure of how to solve this elegantly. I agree that maybe the most pragmatic solution is to open and close a new connection each time a worker is called. I just wanted to point out that a mysql pool for each worker is not a solution to this problem.Kaenel
why sharing a database connection (or connection pool) between different processes would be a bad idea is bad idea?Burletta
@AlexanderYau - Because the connection is not explicitly designed to be used from multiple processes (as in this case) the processes using that connection need to synchronize all accesses to that connection correctly for it to work, with probably significant overhead, probably more then just creating a new connection. I don't say it's impossible, and maybe there exist database connectors that implement something like that...Erastianism
While it's usually a good practice to create one connection per thread, the second sentence in the answer seems to contradict the official docs: "Subsequent calls to connect() that name the same connection pool return connections from the existing pool."Jenks
@MagnusLindOxlund That applies to mysql.connector.pooling.connect(pool_name='...', ...) method calls, which keeps an internal dict of pools by name, but the code above doesn't use that. It explicitly creates new instances of MySQLConnectionPool directly, for which the name doesn't really do anything.Erastianism
H
10
#!/usr/bin/python
# -*- coding: utf-8 -*-
import time
import mysql.connector.pooling


dbconfig = {
    "host":"127.0.0.1",
    "port":"3306",
    "user":"root",
    "password":"123456",
    "database":"test",
}


class MySQLPool(object):
    """
    create a pool when connect mysql, which will decrease the time spent in 
    request connection, create connection and close connection.
    """
    def __init__(self, host="172.0.0.1", port="3306", user="root",
                 password="123456", database="test", pool_name="mypool",
                 pool_size=3):
        res = {}
        self._host = host
        self._port = port
        self._user = user
        self._password = password
        self._database = database

        res["host"] = self._host
        res["port"] = self._port
        res["user"] = self._user
        res["password"] = self._password
        res["database"] = self._database
        self.dbconfig = res
        self.pool = self.create_pool(pool_name=pool_name, pool_size=pool_size)

    def create_pool(self, pool_name="mypool", pool_size=3):
        """
        Create a connection pool, after created, the request of connecting 
        MySQL could get a connection from this pool instead of request to 
        create a connection.
        :param pool_name: the name of pool, default is "mypool"
        :param pool_size: the size of pool, default is 3
        :return: connection pool
        """
        pool = mysql.connector.pooling.MySQLConnectionPool(
            pool_name=pool_name,
            pool_size=pool_size,
            pool_reset_session=True,
            **self.dbconfig)
        return pool

    def close(self, conn, cursor):
        """
        A method used to close connection of mysql.
        :param conn: 
        :param cursor: 
        :return: 
        """
        cursor.close()
        conn.close()

    def execute(self, sql, args=None, commit=False):
        """
        Execute a sql, it could be with args and with out args. The usage is 
        similar with execute() function in module pymysql.
        :param sql: sql clause
        :param args: args need by sql clause
        :param commit: whether to commit
        :return: if commit, return None, else, return result
        """
        # get connection form connection pool instead of create one.
        conn = self.pool.get_connection()
        cursor = conn.cursor()
        if args:
            cursor.execute(sql, args)
        else:
            cursor.execute(sql)
        if commit is True:
            conn.commit()
            self.close(conn, cursor)
            return None
        else:
            res = cursor.fetchall()
            self.close(conn, cursor)
            return res

    def executemany(self, sql, args, commit=False):
        """
        Execute with many args. Similar with executemany() function in pymysql.
        args should be a sequence.
        :param sql: sql clause
        :param args: args
        :param commit: commit or not.
        :return: if commit, return None, else, return result
        """
        # get connection form connection pool instead of create one.
        conn = self.pool.get_connection()
        cursor = conn.cursor()
        cursor.executemany(sql, args)
        if commit is True:
            conn.commit()
            self.close(conn, cursor)
            return None
        else:
            res = cursor.fetchall()
            self.close(conn, cursor)
            return res


if __name__ == "__main__":
    mysql_pool = MySQLPool(**dbconfig)
    sql = "select * from store WHERE create_time < '2017-06-02'"
    p = Pool()
    for i in range(5):
        p.apply_async(mysql_pool.execute, args=(sql,))

Code above creates a connection pool at the beginning, and get connections from it in execute(), once the connection pool has been created, the work is to remain it, since the pool is created only once, it will save the time to request for a connection every time you would like to connect to MySQL. Hope it helps!

Hanhhank answered 20/7, 2017 at 11:46 Comment(2)
how call this from different files ? wont that create different pool instances rather than accessing existing pool objectOcreate
@Ocreate You can create one pool object and create different connection from same pool object.Remodel
C
0

You created multiple DB object instance. In mysql.connector.pooling.py, pool_name is only a attribute to let you make out which pool it is. There is no mapping in the mysql pool.

So, you create multiple DB instance in def readfile(), then you will have several connection pool.

A Singleton is useful in this case.

(I spent several hours to find it out. In Tornado framework, each http get create a new handler, which leads to making a new connection.)

Conrado answered 13/7, 2015 at 9:49 Comment(0)
N
0

There may be synchronization issues if you're going to reuse MySQLConnection instances maintained by a pool, but just sharing a MySQLConnectionPool instance between worker processes and using connections retrieved by calling the method get_connection() would be okay, because a dedicated socket would be created for each MySQLConnection instance.

import multiprocessing
from mysql.connector import pooling

def f(cnxpool: pooling.MySQLConnectionPool) -> None:
    # Dedicate connection instance for each worker process.
    cnx = cnxpool.get_connection()
    ...

if __name__ == '__main__':
    cnxpool = pooling.MySQLConnectionPool(
        pool_name='pool',
        pool_size=2,
    )
    p0 = multiprocessing.Process(target=f, args=(cnxpool,))
    p1 = multiprocessing.Process(target=f, args=(cnxpool,))
    p0.start()
    p1.start()
Neurovascular answered 24/12, 2021 at 18:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.