How to handle split-brain?
Asked Answered
N

1

12

I have read in Orleans FAQ when split-brain could happen but I don't understand what bad can happen and how to handle it properly.

FAQ says something vague like:

You just need to consider the rare possibility of having two instances of an actor while writing your application.

But how actually should I consider this and what can happen if I won't?

Orleans Paper (http://research.microsoft.com/pubs/210931/Orleans-MSR-TR-2014-41.pdf) says this:

application can rely on external persistent storage to provide stronger data consistency

But I don't understand what this means.

Suppose split brain happened. Now I have two instances of one grain. When I'll send a few messages they could be received by these two (or there can be even more?) different instances. Suppose each instance prior to receiving these messages had same state. Now, after processing these messages they have different states.

How they should persist their states? There could be a conflict.

When another instances will be destroyed and only one will remain what will happen to the states of destroyed instances? It'll be like messages processed by them has never been processed? Then client state and server state could be desyncronized IIUC.

I see this (split-brain) as a big problem and I don't understand why there is so little attention to it.

Naara answered 7/2, 2016 at 1:53 Comment(0)
I
10

Orleans leverages the consistency guarantees of the storage provider. When you call this.WriteStateAsync() from a grain, the storage provider ensures that the grain has seen all previous writes. If it has not, an exception is thrown. You can catch that exception and call DeactivateOnIdle() and rethrow the exception or call ReadStateAsync() and retry. So if you have 2 grains during a split-brain scenario, which ever one calls WriteStateAsync() first prevents the other one from writing state without first having read the most up-to-date state.

Update: Starting in Orleans v1.5.0, a grain which allows an InconsistentStateException to be thrown back to the caller will automatically be deactivated when the currently executing calls complete. A grain can catch and handle the exception to avoid automatic deactivation.

Inhambane answered 7/2, 2016 at 2:43 Comment(5)
Is it true, that "an exception is thrown and Orleans will read state before allowing the grain to process the next message" As I see, InconsistentStateException is not handled by the framework.Marieann
@LaszloMagyar your suspicion is correct. I've opened github.com/dotnet/orleans/issues/1420 and corrected the answer. Thanks :)Inhambane
That's still not covering the scenario where one grain receives read-only messages and the second one gets read/write messages- the former will respond with outdated data.Ti
@shay__, that's right: you have to force all methods to perform a read/write from/to storage if you want consistency. Consistent databases often replicate a no-op operation in order to ensure that they are still the primary instance (there are optimized approaches, but the gist is the same). Similarly, since Orleans pushes consistency down to the storage provider, a call to read/write must be used to ensure consistency on every operation which requires it.Inhambane
The behavior changed in v1.5.0, so I've updated the answer to reflect that.Inhambane

© 2022 - 2024 — McMap. All rights reserved.