How can MySQL Cluster 7.3 achieve 99,999% Availability? Antithesis to CAP Theorem

Asked 3/7, 2013 at 11:38 Answered 15/7, 2019 at 10:40

mysql high-availability consistency distributed-database

According to the "Guide to Scaling Web Databases with MySQL Cluster", MySQL Cluster 7.3 can acchieve 99,999% availability while using synchronous update replication. This would be a antithesis to the CAP Theorem since it states that perfect availability (99,999% can be seen as this, no?) and consistency is not acchievable in distributed systems.

How would the cluster react for an update, if the datanode which is responsible for the replica, is not reachable? For a synchronous update replication it must block, which would affect availability.

The Guide states:

The data within a data node is synchronously replicated to all nodes within the Node Group. If a data node fails, then there is always at least one other data node storing the same information.

In the event of a data node failure, the MySQL Server or application node can use any other data node in the node group to execute transactions. The application simply retries the transaction and the remaining data nodes will successfully satisfy the request.

But how can this work if a Node Group consists of two Nodes and one crashes (example here)? There would be no Node to replicate a Update to what, as far as I understand, would make the update fail while using synchronous update replication?! Is the Replication just suspended for the time there does not exist a Node to write a replica to?

Conn answered 3/7, 2013 at 11:38 Comment(0)

On master-master replication if the connection among the hosts are down, then if you try to alter data in any database of any host then certainly to achieve this kind availability the consistency is getting broken. Because now the hosts are not synched and so the data is not consistent. Please look at the below cases:

Case 1: Getting A and C but not P

For example if I don’t replicate a database then the whole database is inside a single host. So here we are getting Consistency and Availability but not Partition tolerance.

Case 2: Getting C and P but not A

For example if I replicate a database(master-master) and keep each one in two hosts. Part P1 is in host H1 and part P2 is in host H2. Now to get partition tolerance I can cut the connection of H1 and H2. Now to get the consistency I shall not allow anyone to change any of P1 and P2. And eventually we are losing Availability.

Case 3: Getting A and P but not C

For example if I replicate a database(master-master) and keep each one in two hosts. Part P1 is in host H1 and part P2 is in host H2. Now to get partition tolerance I can cut the connection of H1 and H2. Now to get the availability I shall allow anyone to change any of P1 and P2. And eventually we are losing Consistency.

Befog answered 15/7, 2019 at 10:40 Comment(0)

In your example question, the problem does not include partition. Partition means half of the data would stay in one node and another half in the other node (it does not need to be a 50% half, but the data needs to be split into several nodes).

Also in your example question, if one of the nodes crashes, the other is still working; hence you have availability. And because one of the nodes is a replica of the other, you should have no problems with consistency.

Just because the update fails, it does not mean the data is not consistent. If you try to access the data from the cluster you will have consistent data, because you cannot retrieve the inconsistent data from the dead node.

In other words, you only have inconsistent data if you query the cluster and the data retried is inconsistent.

Egerton answered 3/7, 2013 at 11:51 Comment(2)

maybe we have availability for reads but if we have only one node per node group left and the number of replicas is defined as two, the remaining node can not accept a write since it can not update the second replica? or do i get something wrong here? – Conn 3/7, 2013 at 12:59

@Conn no, if all nodes are master they will accept update requests and will synchronize with the remainder nodes. If a node is dead, of course, it cannot be updated. If a node dies the system will continue to operate, that is why you have multiple nodes instead of one; if you don't want the system to stop accept update requests you should have multiple masters, instead of a master-slave architecture. This or a slave node be promoted to be the new master. – Egerton 3/7, 2013 at 13:9

Recommended topics

Hot tags