How Marklogic can have consistency and availability?
Asked Answered
V

2

5

The CAP theorem seems logical to me. I understand that:

If I have consistency on a distributed system, I have to wait for all transactions. The cost of ACID is the time to duplicate data on all the network.

But how Marklogic can have both. ACID and distributed system without lag?
So is it possible to have BASE and ACID properties on the same database?
So is CAP theorem wrong?

Vehicular answered 7/8, 2015 at 11:45 Comment(0)
R
13

Availability in CAP Theorem is about the hosts that are on either side of the partition, not about the system as a whole.

In CAP Theorem you are "Available" if all hosts on either side of a network partition can continue to accept both read and update transactions. Most of our customers don't care if all hosts remain available in the face of a network partition. They care that the database as a whole remain available during a network partition. So if the cluster has replicated or shared data so that there is enough data on both sides of the partition to continue to serve queries, and is smart enough to know which side of the partition should remain available and which should gracefully bow out, then the database can remain available in the face of a network partition, even if all hosts do not. That's what MarkLogic does within a cluster.

Between clusters, MarkLogic has many options for how close to absolutely consistent you want to be. We use asynchronous replication to move data between clusters, so there if there is a network partition between clusters, the data may not be consistent between those clusters. You can control how long that lag limit is so that you can tune this, and if you need absolute consistency between clusters, we have ways of achieving that as well.

Bottom line is that:

  • Customers care mostly that their database or data services remain available, not that any specific host remain available, so we focus on availability of the system and can provide that without violating CAP Theorem.
  • Multi-cluster MarkLogic deployments can be tuned to give you the right balance of consistency and availability in the face of a network partition.

Hope that helps.

Ramentum answered 7/8, 2015 at 17:40 Comment(5)
So If I well understand, by default, Marcklogic is ACID at the cluster level but not to the complete database network level (because of consistency) ? And if I need absolute consistency between cluster, is system always partition tolerant ?Vehicular
A database lives within a cluster, so for a given database, MarkLogic is ACID. A database can be replicated to a second cluster for disaster recovery. We do this via log-shipping. Within that second cluster, that database is also ACID. However, because the replication is asynchronous, the replica database always lags behind the master database by a few seconds. This lag limit is configurable. You can also configure two MarkLogic clusters to remain always synchronous, but the penalty you pay there is that your transactions will take longer due to high latency between clusters. Make sense?Ramentum
Ok it makes sense. So two more questions to be sure I understand. Second cluster is for disaster recovery so you can't request it from production app, isn't it ? Your cluster is generally centralized in one datacenter or at least on one continent to limit lag between node of cluster, isn't it ? Thanks to take time to answer, I really appreciate :).Vehicular
Whether you can query the second cluster depends on your configuration and your license. Sometimes it's used for DR, sometimes used for data geo-location. Clusters are designed to be within one data center, but in some cases you can stretch a cluster between data centers. This is particularly doable on Amazon (where it's regions, not data centers). This is not appropriate for data geo-location (because all queries still go to all hosts in a cluster), but can be a good DR solution if latency is low enough and bandwidth high enough.Ramentum
You're welcome. One more thing: If you do stretch a cluster between availability regions or data centers, remember that you will need three of these, not two. This is because MarkLogic uses a quorum voting system to determine where the partition is and which side should remain active, and if you only have two data centers, neither one will be able to achieve majority for the quorum vote.Ramentum
T
3

The CAP theorem is not wrong, it's just out-dated. Here's the update from the author: CAP Twelve Years Later: How the "Rules" Have Changed.

MarkLogic supports ACID properties via MVCC. If you like, you could configure it to behave with BASE properties instead. The key, as I understand it, is to design and optimize for your production requirements. MarkLogic has a host of replication features available and we're constantly adding to that portfolio as our customers solve real-world problems deploying globally-distributed clusters.

Have you read Inside MarkLogic Server? That white-paper does a great job explaining how MarkLogic solves many of these challenges.

Telegraph answered 7/8, 2015 at 15:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.