Django Multiple Databases Fallback to Master if Slave is down
Asked Answered
C

1

18

I have master - slave replication setup for MySQL db backend for Django. Currently I am reading and writing for Master DB only, but my dashboards are quite query intensive. I was searching for an option, where in I can Define like following DATABASES

DATABASES = {
'default_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '3306',
    },
'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  

for dashboards, reports and various other apps, what I want to do is:

Try connection : default_slave : using default_slave if reachable else using default

That is, if slave is up, fetch the reports from slave database itself, if not fetch the reports from master database.

Catch is, slave can be up or down, and i want this to be dynamically selectable regarding which database to use for fetching reports, based on rechability.

Is this possible ? Can I test connection before hand and move forward ?

With this I would write and sync_db in Master, and always read from Slave, if slave is up.

Need some solution/hint for raw queries as well as orm queries

The router concept seems nice, but fallback on slave not reachable, i don't know the possibility.

UPDATE

How to go about MULTI-DATABASE

DATABASES

DATABASES = {
'default_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '3306',
    },
'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'linux': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'linux_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'mac': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'mac_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'pc': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'pc_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  

Now, I have 1. Static Data 2. Dynamic Data

Static Data has to be stored in 'default' which will be replicated to 'default_slave'

For dynamic data, the query first needs to judge where might the Dynamic Data be lying : in 'mac' or in 'pc' or in 'linux'

To Achieve that, I added one field in 'static table' : 'query_on' which contains either ['mac' or 'linux' or 'pc']

Now, whicl using query set, I am simply writing static = Static.objects.get(pk = 1)
query_on = static.query_on dynamic = Dynamic.objects.get(static = static).using(alias=query_on)

This works well, the query routes to the database it needs to get executed, here I need to judge :

  1. If <'query_on'>_slave : connection is up : use : <'query_on'>_slave OR
  2. If <'query_on'>_slave : connection is down : use : <'query_on'>

How to go about that ?

Further details for the application:

  1. There is one database : default (configuration & analytics database) : for maintaining the configuration data and report analytics data
  2. There are 20 databases (raw databases) : as example says : mac, linux, rhel, windows, pc .... (example name) : for collecting the raw data, which is not processes for analytics
  3. each database has one or multiple slaves, naming convention would be : default_slave_0, default_slave_1, default_slave_2 and so for other databases as well

Now the analytics data needs to be first queried per 5 minutes, 30 minutes, 1 hour .... and that query needs to be sent out to specific database, because not every database will be carrying specific dataset required for analytics.

To do that, we need to

  1. get the configuration data from (default or any one of its slave (slave part is the question) )
  2. once we have the configuration, we can easily look into where the "raw" data might be
  3. query for raw data, and gather results and analyse --> store it in "default" database.

Now all the 30 (raw) and 1 default database, would require "sync", as we maintain the same data abse structure throughout all the nodes.

Now, since we are looking at CPU spikes on all the databases, it makes sense to use "slave" databases to query for "raw" data.

hence the requirement of using . I am not able to imagine how routers would be of help here ?

Cupped answered 28/10, 2014 at 12:54 Comment(0)
G
19

You are on the right track with using a router. I'm assuming the fact that your two db definitions are identical is just a typo.

(FYI, I'm going to refer to the database hierarchy using the more sensitive master->follower)

In your db_for_read() functions, you can check for connectivity to your follower. This might incur a bit more overhead, but that's the cost of having auto-failover for a database. An example database definition would be:

DATABASES = {
'follower': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'follower',
        'USER': 'root',
        'HOST': '54.34.65.24',
        'PORT': '3306',
    },
'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'HOST': '54.34.65.23',
        'PORT': '3306',
    },
}  

You can test the connection with a quick try/except like this example. A router using this that does what you need would look like:

from django.conf import settings
import socket


def test_connection_to_db(database_name):
    try:
        db_definition = getattr(settings, 'DATABASES')[database_name]
        s = socket.create_connection((db_definition['HOST'], db_definition['PORT']), 5)
        s.close()
        return True
    except (AttributeError, socket.timeout) as e:
        return False


class FailoverRouter(object):
    """A router that defaults reads to the follower but provides a failover back to the default"""

    def db_for_read(self, model, **hints):
        if test_connection_to_db('follower'):
            return 'follower'
        return 'default'

    def db_for_write(self, model, **hints):
        "Point all writes to the default db"
        return 'default'

    def allow_syncdb(self, db, model):
        "Make sure only the default db allows syncdb"
        return db == 'default'

This will still syncdb in master like you want. Also, you could make the logic for both db_for_read() and db_for_write() more complicated (like pick the follower db only for certain models that are queried for your reports.

I don't know what overhead this test_connection() will cause for every read, since that will depend on the MySQL server and the timeout. Perhaps a better architecture is to cache these reports using memcached, or just work out the issues with the slave ever going down and update your database definitions in settings first.

Gatias answered 13/11, 2014 at 18:48 Comment(8)
Thanks. That ought-to work. What I am planning to do is, .using(alias = defualt_slave) if connection works, cool if not, using= default (split with _slave) . that would be a quick cook up.Cupped
please check the update in the question. I am not able to resolve the problem yet, using router does not make sense when query set is having using . @GatiasCupped
Instead of passing query_on to the using property, try looking up that value in the router, not when you create the queryset. That way, you can still let the router decide what db to use.Gatias
first of, sorry for answer edit. it was a mistake. I have made changes to question please check. @GatiasCupped
If you are wanting to target a specific database to gather "raw" data, I don't see why it is a problem to just use the .using() modifier. That would hit the exact database you are intending, then you can store that information in the default database using your router.Gatias
using() is not a problem :) I am choosing the exact database, I just want to use a specific slave first, if slave is up else use the master for the database. got my problem ? thanks for all your help. this is what confusing me @GatiasCupped
Currently, routers only receive a hint of the instance being saved or read, and it isn't always present. Otherwise, you could just pass your database prefix to it as an argument and let it decide whether to connect to the default or the follower. But since that is the case, just do the test_connection_to_db() right where you are adding the .using() modifier. You can put it in some helper file that can be referenced from multiple querysets, but that logic can just be done wherever you need to calculate the db name for .using()Gatias
If I got the idea correctly, this solution is about pinging the server on each get_queryset and using calls, isn't it?Impetuous

© 2022 - 2024 — McMap. All rights reserved.