Caching/reusing a DB connection for later view usage

Asked 5/11, 2018 at 2:58 Answered 16/11, 2018 at 15:2

Solved python sql database database-connection connection-pooling

I am saving a user's database connection. On the first time they enter in their credentials, I do something like the following:

self.conn = MySQLdb.connect (
    host = 'aaa',
    user = 'bbb',
    passwd = 'ccc',
    db = 'ddd',
    charset='utf8'
)
cursor = self.conn.cursor()
cursor.execute("SET NAMES utf8")
cursor.execute('SET CHARACTER SET utf8;')
cursor.execute('SET character_set_connection=utf8;')

I then have the conn ready to go for all the user's queries. However, I don't want to re-connect every time the view is loaded. How would I store this "open connection" so I can just do something like the following in the view:

def do_queries(request, sql):
    user = request.user
    conn = request.session['conn']
    cursor = request.session['cursor']
    cursor.execute(sql)

Update: it seems like the above is not possible and not good practice, so let me re-phrase what I'm trying to do:

I have a sql editor that a user can use after they enter in their credentials (think of something like Navicat or SequelPro). Note this is NOT the default django db connection -- I do not know the credentials beforehand. Now, once the user has 'connected', I would like them to be able to do as many queries as they like without me having to reconnect every time they do this. For example -- to re-iterate again -- something like Navicat or SequelPro. How would this be done using python, django, or mysql? Perhaps I don't really understand what is necessary here (caching the connection? connection pooling? etc.), so any suggestions or help would be greatly appreciated.

Endosteum answered 5/11, 2018 at 2:58 Comment(13)

@kungphu for this, the use case is we have a sql editor and someone enters in their sql credentials and then they can query their database and we display the results (think any sql gui). What do you think would be the best way to 'store' the connection without having to keep re-connecting? – Endosteum 7/11, 2018 at 17:46

@kungphu -- updated question. – Endosteum 8/11, 2018 at 6:22

Well, a simple approach is to not close the session if you want to reuse it :-) – Dateline 8/11, 2018 at 18:48

@Dateline -- could you please demonstrate as to how that would be done in practice? – Endosteum 8/11, 2018 at 20:51

I'm a database guy, not a programmer. I don't know about your client, but every SQL client can logon once and then use this connection until it explicitly logs off. – Dateline 8/11, 2018 at 22:6

@Dateline I see -- right, I guess the challenge in the above is how to 'get' the open connection once that's been established. – Endosteum 8/11, 2018 at 22:9

@Dateline does the IoC answer below describe what you're suggesting here? – Endosteum 9/11, 2018 at 17:29

Sounds promising :-) – Dateline 9/11, 2018 at 17:56

What webserver do you use with your python backend? – Astounding 9/11, 2018 at 19:50

@ffeast django/wsgi/apache – Endosteum 9/11, 2018 at 19:53

How many users do you envision being active at any one time? Also, are multiple queries being sent via multiple http requests? If you're using Django underneath, I'd assume so. – Balkh 13/11, 2018 at 2:15

@JonahBishop any number of queries or users could be active at any one time. The alternative would be to authenticate each time via the connection credentials, so it would be the same as looking up a user, for example (though in this case, looking up a DB connection). – Endosteum 13/11, 2018 at 2:37

I find it hard to provide a proper solution without knowing the in's and out's of the application. My first approach would be creating a permanent connecting with the server that would be used as proxy to other connections, but again, I'd need to know more details. – Laurenelaurens 14/11, 2018 at 7:27

You could use an IoC container to store a singleton provider for you. Essentially, instead of constructing a new connection every time, it will only construct it once (the first time ConnectionContainer.connection_provider() is called) and thereafter it will always return the previously constructed connection.

You'll need the dependency-injector package for my example to work:

import dependency_injector.containers as containers
import dependency_injector.providers as providers


class ConnectionProvider():
    def __init__(self, host, user, passwd, db, charset):
        self.conn = MySQLdb.connect(
            host=host,
            user=user,
            passwd=passwd,
            db=db,
            charset=charset
        )


class ConnectionContainer(containers.DeclarativeContainer):
    connection_provider = providers.Singleton(ConnectionProvider,
                                              host='aaa',
                                              user='bbb',
                                              passwd='ccc',
                                              db='ddd',
                                              charset='utf8')


def do_queries(request, sql):
    user = request.user
    conn = ConnectionContainer.connection_provider().conn
    cursor = conn.cursor()
    cursor.execute(sql)

I've hardcoded the connection string here, but it is also possible to make it variable depending on a changeable configuration. In that case you could also create a container for the configuration file and have the connection container read its config from there. You then set the config at runtime. As follows:

import dependency_injector.containers as containers
import dependency_injector.providers as providers

class ConnectionProvider():
    def __init__(self, connection_config):
        self.conn = MySQLdb.connect(**connection_config)

class ConfigContainer(containers.DeclarativeContainer):
    connection_config = providers.Configuration("connection_config")

class ConnectionContainer(containers.DeclarativeContainer):
    connection_provider = providers.Singleton(ConnectionProvider, ConfigContainer.connection_config)

def do_queries(request, sql):
    user = request.user
    conn = ConnectionContainer.connection_provider().conn
    cursor = conn.cursor()
    cursor.execute(sql)


# run code
my_config = {
    'host':'aaa',
    'user':'bbb',
    'passwd':'ccc',
    'db':'ddd',
    'charset':'utf8'
}

ConfigContainer.connection_config.override(my_config)
request = ...
sql = ...

do_queries(request, sql)

Mandola answered 9/11, 2018 at 7:35 Comment(0)

I don't see why do you need a cached connection here and why not just reconnect on every request caching user's credentials somewhere, but anyway I'll try to outline a solution that might fit your requirements.

I'd suggest to look into a more generic task first - cache something between subsequent requests your app needs to handle and can't serialize into django's sessions. In your particular case this shared value would be a database connection (or multiple connections). Lets start with a simple task of sharing a simple counter variable between requests, just to understand what's actually happening under the hood.

Amaizingly but neither answer has mentioned anything regarding a web server you might use! Actually there are multiple ways to handle concurrent connections in web apps:

Having multiple processes, every request comes into one of them at random
Having multiple threads, every request is handled by a random thread
p.1 and p.2 combined
Various async techniques, when there's a single process + event loop handling requests with a caveat that request handlers shouldn't block for a long time

From my own experience p.1-2 are fine for majority of typical webapps. Apache1.x could only work with p.1, Apache2.x can handle all of 1-3.

Lets start with the following django app and run a single-process gunicorn webserver. I'm going to use gunicorn because it's fairly easy to configure it unlike apache (personal opinion :-)

views.py

import time

from django.http import HttpResponse

c = 0

def main(self):
    global c
    c += 1
    return HttpResponse('val: {}\n'.format(c))


def heavy(self):
    time.sleep(10)
    return HttpResponse('heavy done')

urls.py

from django.contrib import admin
from django.urls import path

from . import views

urlpatterns = [
    path('admin/', admin.site.urls),
    path('', views.main, name='main'),
    path('heavy/', views.heavy, name='heavy')
]

Running it in a single process mode:

gunicorn testpool.wsgi -w 1

Here's our process tree - there's only 1 worker that would handle ALL requests

pstree 77292
-+= 77292 oleg /Users/oleg/.virtualenvs/test3.4/bin/python /Users/oleg/.virtualenvs/test3.4/bin/gunicorn testpool.wsgi -w 1
 \--- 77295 oleg /Users/oleg/.virtualenvs/test3.4/bin/python /Users/oleg/.virtualenvs/test3.4/bin/gunicorn testpool.wsgi -w 1

Trying to use our app:

curl 'http://127.0.0.1:8000'
val: 1

curl 'http://127.0.0.1:8000'
val: 2

curl 'http://127.0.0.1:8000'
val: 3

As you can see you can easily share the counter between subsequent requests. The problem here is that you can only serve a single request in parallel. If you request for /heavy/ in one tab, / won't work until /heavy is done

Lets now use 2 worker processes:

gunicorn testpool.wsgi -w 2

This is how the process tree would look like:

 pstree 77285
-+= 77285 oleg /Users/oleg/.virtualenvs/test3.4/bin/python /Users/oleg/.virtualenvs/test3.4/bin/gunicorn testpool.wsgi -w 2
 |--- 77288 oleg /Users/oleg/.virtualenvs/test3.4/bin/python /Users/oleg/.virtualenvs/test3.4/bin/gunicorn testpool.wsgi -w 2
 \--- 77289 oleg /Users/oleg/.virtualenvs/test3.4/bin/python /Users/oleg/.virtualenvs/test3.4/bin/gunicorn testpool.wsgi -w 2

Testing our app:

curl 'http://127.0.0.1:8000'
val: 1

curl 'http://127.0.0.1:8000'
val: 2

curl 'http://127.0.0.1:8000'
val: 1

The first two requests has been handled by the first worker process, and the 3rd one - by the second worker process that has its own memory space so you see 1 instead of 3. Notice your output may differ because process 1 and 2 are selected at random. But sooner or later you'll hit a different process.

That's not very helpful for us because we need to handle multiple concurrent requests and we need to somehow get our request handled by a specific process that can't be done in general case.

Most pooling technics coming out of the box would only cache connections in the scope of a single process, if your request gets served by a different process - a NEW connection would need to be made.

Lets move to threads

gunicorn testpool.wsgi -w 1 --threads 2

Again - only 1 process

pstree 77310
-+= 77310 oleg /Users/oleg/.virtualenvs/test3.4/bin/python /Users/oleg/.virtualenvs/test3.4/bin/gunicorn testpool.wsgi -w 1 --threads 2
 \--- 77313 oleg /Users/oleg/.virtualenvs/test3.4/bin/python /Users/oleg/.virtualenvs/test3.4/bin/gunicorn testpool.wsgi -w 1 --threads 2

Now if you run /heavy in one tab you'll still be able to query / and your counter will be preserved between requests! Even if the number of threads is growing or shrinking depending on your workload it should still work fine.

Problems: you'll need to synchronize access to the shared variable like this using python threads synchronization technics (read more). Another problem is that the same user may need to to issue multiple queries in parallel - i.e. open multiple tabs.

To handle it you can open multiple connections on the first request when you have db credentials available.

If a user needs more connections than your app might wait on lock until a connection becomes available.

Back to your question

You can create a class that would have the following methods:

from contextlib import contextmanager

class ConnectionPool(object):

   def __init__(self, max_connections=4):
      self._pool = dict()
      self._max_connections = max_connections

   def preconnect(self, session_id, user, password):
       # create multiple connections and put them into self._pool
       # ...

    @contextmanager
    def get_connection(sef, session_id):
       # if have an available connection:
            # mark it as allocated
            # and return it
            try:
                yield connection
           finally:
              # put it back to the pool
              # ....
       # else
        # wait until there's a connection returned to the pool by another thread

pool = ConnectionPool(4)

def some_view(self):
     session_id = ...
     with pool.get_connection(session_id) as conn:
        conn.query(...)

This is not a complete solution - you'll need to somehow delete outdated connections not used for a long time.

If a user comes back after a long time and his connection have been closed, he'll need to provide his credentials again - hopefully it's ok from your app's perspective.

Also keep in mind python threads have its performance penalties, not sure if this is an issue for you.

I haven't checked it for apache2 (too much configuration burden, I haven't used it for ages and generally use uwsgi), but it should work there too - would be happy to hear back from you if you manage to run it )

And also don't forget about p.4 (async approach) - unlikely will you be able to use it on apache, but it's worth investigation - keywords: django + gevent, django + asyncio. It has its pros/cons and may greatly affect your app implementation so it's hard to suggest any solution without knowing your app requirements in detail

Astounding answered 14/11, 2018 at 23:26 Comment(0)

This is not a good idea to do such a thing synchronously in web app context. Remember that your application may needs to work in multi process/thread fashion, and you could not share connection between processes normally. So if you create a connection for your user on a process, there is no guaranty to receive query request on the same one. May be a better idea is to have a single process background worker which handles connections in multiple threads (a thread per session) to make queries on database and retrieve result on web app. Your application should assign a unique ID to each session and the background worker track each thread using session ID. You may use celery or any other task queues supporting async result. So the design would be something like below:

             |<--|        |<--------------|                   |<--|
user (id: x) |   | webapp |   | queue |   | worker (thread x) |   | DB
             |-->|        |-->|       |-->|                   |-->|

Also you could create a queue for each user until they have an active session, as a result you could run a separate background process for each session.

Immigration answered 15/11, 2018 at 1:53 Comment(0)

I actually shared my solution to this exact issue. What I did here was create a pool of connections that you can specify the max with, and then queued query requests async through this channel. This way you can leave a certain amount of connections open, but it will queue and pool async and keep the speed you are used to.

This requires gevent and postgres.

Python Postgres psycopg2 ThreadedConnectionPool exhausted

Leet answered 16/11, 2018 at 15:2 Comment(6)

@eatmeimandanish -- this is a great answer, thanks for sharing. In terms of performance, how many faster is it when using the connection pool vs. reconnecting each time? For example, a typical query that takes 34ms with connection pooling takes 107ms without it. -- do you have any metrics with that? – Endosteum 16/11, 2018 at 21:6

The pool only connects when there is no connection. Once the pool connects it stays open as long as the queue is filled. This is normal behavior. You do not want connections hung in a database that are not doing anything. – Leet 26/11, 2018 at 16:44

I know I'm saying what is the difference in time between using a connection pool and not using a connection pool? – Endosteum 26/11, 2018 at 19:23

One uses a set amount of actual connections and reuses those connections for queued queries. Not having a pool would require either just one permanent connection for all queries, or a new connection for every query. – Leet 26/11, 2018 at 21:1

Thanks -- I understand the concept. I am saying: do you have a benchmark (or can you perform a benchmark) on the difference in time between doing a connection pool and doing a normal connect/disconnect on each query? What is the time difference? – Endosteum 26/11, 2018 at 21:2

Oh I see. No I have not done any benchmarks, outside of running it for thousands of people all day long without any problems. – Leet 26/11, 2018 at 22:9

I'm no expert in this field, but I believe that PgBouncer would do the job for you, assuming you're able to use a PostgreSQL back-end (that's one detail you didn't make clear). PgBouncer is a connection pooler, which allows you re-use connections avoiding the overhead of connecting on every request.

According to their documentation:

user, password

If user= is set, all connections to the destination database will be done with the specified user, meaning that there will be only one pool for this database.

Otherwise PgBouncer tries to log into the destination database with client username, meaning that there will be one pool per user.

So, you can have a single pool of connections per user, which sounds just like what you want.

In MySQL land, the mysql.connector.pooling module allows you to do some connection pooling, though I'm not sure if you can do per-user pooling. Given that you can set up the pool name, I'm guessing you could use the user's name to identify the pool.

Regardless of what you use, you will likely have occasions where reconnecting is unavoidable (a user connects, does a few things, goes away for a meeting and lunch, comes back and wants to take more action).

Balkh answered 13/11, 2018 at 14:50 Comment(0)

I am just sharing my knowledge over here.

Install the PyMySQL to use the MySql

For Python 2.x

pip install PyMySQL

For Python 3.x

pip3 install PyMySQL

1. If you are open to use Django Framework then it's very easy to run the SQL query without any re-connection.

In setting.py file add the below lines

DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'NAME': 'test',
            'USER': 'test',
            'PASSWORD': 'test',
            'HOST': 'localhost',
            'OPTIONS': {'charset': 'utf8mb4'},
        }
    }

In views.py file add these lines to get the data. You can customized your query according to your need

from django.db import connection
def connect(request):
    cursor = connection.cursor()
    cursor.execute("SELECT * FROM Tablename");
    results = cursor.fetchall()
    return results

You will get the desire results.

Click here for more information about it

2. For python Tkinter

from Tkinter import *
import MySQLdb

db = MySQLdb.connect("localhost","root","root","test")
# prepare a cursor object using cursor() method
cursor = db.cursor()
cursor.execute("SELECT * FROM Tablename")
if cursor.fetchone() is not None:
    print("In If")
else:
    print("In Else")
cursor.close()

Refer this for more information

PS: You can check this link for your question to reusing a DB connection for later.

How to enable MySQL client auto re-connect with MySQLdb?

Palladian answered 14/11, 2018 at 11:4 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++