Database cluster and load balancing
Asked Answered
V

4

167

What is database clustering? If you allow the same database to be on 2 different servers how do they keep the data between synchronized. And how does this differ from load balancing from a database server perspective?

Valaria answered 22/7, 2009 at 5:32 Comment(0)
T
140

Database clustering is a bit of an ambiguous term, some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers.

Replication defines the method by which a set of servers remain synchronized without having to share the storage being able to be geographically disperse, there are two main ways of going about it:

  • master-master (or multi-master) replication: Any server can update the database. It is usually taken care of by a different module within the database (or a whole different software running on top of them in some cases).

    Downside is that it is very hard to do well, and some systems lose ACID properties when in this mode of replication.

    Upside is that it is flexible and you can support the failure of any server while still having the database updated.

  • master-slave replication: There is only a single copy of authoritative data, which is the pushed to the slave servers.

    Downside is that it is less fault tolerant, if the master dies, there are no further changes in the slaves.

    Upside is that it is easier to do than multi-master and it usually preserve ACID properties.

Load balancing is a different concept, it consists distributing the queries sent to those servers so the load is as evenly distributed as possible. It is usually done at the application layer (or with a connection pool). The only direct relation between replication and load balancing is that you need some replication to be able to load balance, else you'd have a single server.

Tinned answered 22/7, 2009 at 5:44 Comment(5)
Ah, forgot about replication :) Yes, you can achieve load balancing that way in combination w/ application level logic :). +1Oversell
Postgresql docs refer to "database cluster" differently: "Before you can do anything, you must initialize a database storage area on disk. We call this a database cluster. (SQL uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server. " postgresql.org/docs/8.3/static/creating-cluster.htmlWilhelm
What does ACID properties mean, or rather, what is it exactly you loose if you don't preserve them?Invocate
@Invocate In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc. In the context of databases, a sequence of database operations that satisfies the ACID properties (and these can be perceived as a single logical operation on the data) is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.Oligopoly
"some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers." by "servers" you mean "database servers"?Sag
O
19

From SQL Server point of view:

Clustering will give you an active - passive configuration. Meaning in a 2 node cluster, one of them will be the active (serving) and the other one will be passive (waiting to take over when the active node fails). It's a high availability from hardware point of view.

You can have an active-active cluster, but it will require multiple instances of SQL Server running on each node. (i.e. Instance 1 on Node A failing over to Instance 2 on Node B, and instance 1 on Node B failing over to instance 2 on Node A).

Load balancing (at least from SQL Server point of view) does not exists (at least in the same sense of web server load balancing). You can't balance load that way. However, you can split your application to run on some database on server 1 and also run on some database on server 2, etc. This is the primary mean of "load balancing" in SQL world.

Oversell answered 22/7, 2009 at 5:42 Comment(0)
B
10

Clustering uses shared storage of some kind (a drive cage or a SAN, for example), and puts two database front-ends on it. The front end servers share an IP address and cluster network name that clients use to connect, and they decide between themselves who is currently in charge of serving client requests.

If you're asking about a particular database server, add that to your question and we can add details on their implementation, but at its core, that's what clustering is.

Bilander answered 22/7, 2009 at 5:54 Comment(0)
I
8

Database Clustering is actually a mode of synchronous replication between two or possibly more nodes with an added functionality of fault tolerance added to your system, and that too in a shared nothing architecture. By shared nothing it means that the individual nodes actually don't share any physical resources like disk or memory.

As far as keeping the data synchronized is concerned, there is a management server to which all the data nodes are connected along with the SQL node to achieve this(talking specifically about MySQL).

Now about the differences: load balancing is just one result that could be achieved through clustering, the others include high availability, scalability and fault tolerance.

Irisation answered 10/1, 2013 at 16:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.