What are the use cases for a Vector Clock versus a Version Vector?

Asked 24/10, 2019 at 15:20 Answered 7/2, 2023 at 5:8

data-structures synchronization replication distributed-computing distributed-system

I have been having trouble finding an example of what use cases are suitable for Vector Clocks and Version Vectors, and how they might differ. I understand that they largely work in the same way, with Vector Clocks using receive and send functions, and Version Vectors using a sync function instead, but I do not understand the differences between the two options. Is it just two different ways of expressing the same thing, or are there real differences in use cases between them?

I was only able to find one question that was somewhat related: "When do I use a consensus algorithm like Paxos vs using a something like a Vector Clock?"

Even though the linked answer states the following, and references a short article, the differences are still unclear to me.

You might want to use a version vector for a leaderless distributed storage. You might use vector clocks for the same (although it's a worse fit; the article also suggests you use it for consistent snapshots, for implementing causal ordering in general distributed systems etc).

Albinus answered 24/10, 2019 at 15:20 Comment(3)

This article provides a pretty good overview of both: haslab.wordpress.com/2011/07/08/… – Trampoline 9/1, 2020 at 23:8

one of the best book for distributed systems concept - pdfs.semanticscholar.org/24f1/…, it has detail explanation of how time/clock differences between machines in dist system can give a wrong impression – Sheryllshetland 7/2, 2020 at 13:13

@Sheryllshetland link is broken. Checked from web archive, it was designing data-intensive applications – Capetian 7/2, 2023 at 2:52

Same question here, and it's still not absolutely clear to me, but what I've found is that version vectors are more suitable to determine the causality of events in a specific network of replicated nodes in a distributed system, where the only thing you are interested in is what happened first and what happened after.

By contrast, a vector clock determines event order in an undetermined sequence of events in a distributed system.

In that sense, using integers for version vectors is overly complicated, because if we just want to determine which node, A or B, is more updated, given a situation where initially A[2,2] and B[2,2] (therefore in sync).

From the version vector perspective, A[3,2] > B[2,2] means the same as A[10,2] > B[2,2]. That would explain why we can use a fixed set of values for version vectors and the only important operation is just sync versions.

From the vector clock perspective, there is a difference between A[10,2] and A[3,2]. It means that +7 events happened in the meantime. That would explain why we need to keep track of all the events and there are send and receive operations to sync all the vector clocks in the network.

Anyways, I'm missing like you some clear document that explains clearly the difference and the usages of one compared to the other.

Durra answered 10/12, 2019 at 14:2 Comment(1)

I'm missing like you some clear document that explains clearly the difference and the usages of one compared to the other.

Did you find this document or maybe a hands-on/practical tutorial about version vectors? Thank you! – Drum 1/11, 2021 at 20:6

These two are confusing because of same merging mechanism but the use cases are different.

Vector Clock is used for determining partial ordering of event in distributed system. It will detect causality violation and concurrent event. It use receive and send to check each node order but not sync the status.
Version Vector is used for comparing the state of items in multiple (DB's) replica. This is why it uses sync function, it is trying to sync the status between all.

In Vector Clock, we know event E_a and E_b is not ordered and we can find out the event order by there versions.

 ┌──┐                 ┌──┐  ┌──┐
 │A0│                 │A1│  │A2│
 │  │      ...        │B2│  │B2│ <- E_a
 │  │                 │C1│  │C1│
 └──┘                 └──┘  └──┘
---------------INDEPENDENT---------------
 ┌──┐      ┌──┐  ┌──┐       ┌──┐
 │  │      │  │  │  │       │  │
 │B0│ ...  │B1│  │B2│       │B3│ <- E_b
 │  │      │C1│  │C1│       │C1│
 └──┘      └──┘  └──┘       └──┘
---------------INDEPENDENT---------------
 ┌──┐  ┌──┐
 │  │  │  │
 │  │  │  │
 │C0│  │C1│
 └──┘  └──┘

A0 represent item A is in version 0

In Version Vector, when we read data from multiple DB instances, we will find out there are different version of data and merge the conflict. This can help the state eventually consistence.

                                                 Final state
User A  │ +Egg v1              +Milk +Ham v3    │Egg+Ham+Milk, v3
        │  │   ^                  │   ^         │
        │  V   │                  V   │ Ham v3  │
Cart DB │  ok,v1    ok,v2         ok,v3         │
        │           ^   │ Egg v2                │
        │           │   V                       │
User B  │        +Ham  +Egg v2                  │Egg+Ham, v2
-------------- Time ----------->

In this case, there is only one DB instance to make it easy to illustrate, but you can think as there DB instances with replication delay between them. When you read the data, you'll get the version vector and resolve conflicts from writes happened to different replicas during a replication lag using the version vectors on each.

Referrer

Stereograph answered 10/9, 2021 at 8:27 Comment(0)

-1

They are pretty much the same thing.

In fact, sync is a high level abstraction that should be implemented with send and receive.

To make it clearer, imagine a case where for the same object (or key if it is a key-value store), two replicas now have so-called version vectors [1,0] and [0,1] respectively. Now how should they synchronize?

They can't - unless you explicitly implement a merge strategy to handle conflicts. So sync doesn't just come out of nothing. It is a high level API that internally contains a merge strategy, as well as calls to send and receive.

Now if you consider both vector clocks and version vectors from the perspective of send and receive, the difference between them is mostly conceptual.

Vector clocks are CLOCKS, used to order events in a distributed system. Each time a node sends, receives, or even updates something internally, an event occurs.

Version vectors order VERSIONS of replicas in a distributed data store, which is a specific kind of distributed system. Note that not all events change the version of a replica - only write requests/events do. So instead of concerning ourselves with events, it is better to conceptually focus only on versions of the replicas, which are the result of the events.

See Why Logical Clocks are Easy for a gentle introduction to related concepts.

Capetian answered 7/2, 2023 at 5:8 Comment(0)

Recommended topics

Hot tags