Merkle trees (aka hash trees) are used for data synchronization in both "Cassandra" & "Dynamo".
As with any hash function, there is a probability that different data can have the same hash value:
There exists an x and y where [y!=x] but [hash(x) = hash(y)]
As the "big data" in NOSQL grows, the probability of encountering such data becomes higher.
This means that as data sets get bigger, it is almost certain that different nodes in the Merkle tree will yield the same parent hash.
On such an occasion, when two different machines in the cluster traverse their merkle trees, they will get a false positive that their data is consistent. If no more data is written to that branch of the tree, the machines will remain unsynchronized forever.
How is this handled?