Should users be directed to specific data nodes when using an eventually consistent datastore?
Asked Answered
D

1

8

When running a web application in a farm that uses a distributed datastore that's eventually consistent (CouchDB in my case), should I be ensuring that a given user is always directed to same the datastore instance?

It seems to me that the alternate approach, where any web request can use any data store, adds significant complexity to deal with consistency issues (retries, checks, etc). On the other hand, if a user in a given session is always directed to the same couch node, won't my consistency issues revolve mostly around "shared" user data and thus be greatly simplified?

I'm also curious about strategies for directing users but maybe I'll keep that for another question (comments welcome).

Darill answered 10/4, 2015 at 13:56 Comment(0)
H
2

According to the CAP Theorem, distributed systems can either have complete consistency (all nodes see the same data at the same time) or availability (every request receives a response). You'll have to trade one for the other during a partition or datastore instance failure.


Should I be ensuring that a given user is always directed to same the datastore instance?

Ideally, you should not! What will you do when the given instance fails? A major feature of a distributed datastore is to be available in spite of network or instance failures.


If a user in a given session is always directed to the same couch node, won't my consistency issues revolve mostly around "shared" user data and thus be greatly simplified?

You're right, the architecture would be a lot more simpler that way, but again, what would you do if that instance fails? A lot of engineering effort has gone into distributed systems to allow multiple instances to reply to a query. I am not sure about CouchDB, but Cassandra allows you to choose your consistency model, you'll have to tradeoff availability for higher degree of consistency. The client is configured to request servers in a round-robin fashion by default, which distributes the load.

I would recommend you read the Dynamo paper. The authors describe a lot of engineering details behind a distributed database.

Helgoland answered 28/7, 2015 at 10:29 Comment(4)
Couchbase is strongly consistent. You read your own write and there is only one active representation of the data on the cluster. There are replicate but consistency is not guaranteed if you query those. You can also choose on each write if you want to wait for the document to be persisted to disk, replicate up to fourth time or persisted to the master and to other replicas.Macrophysics
The question specified CouchDB, not Couchbase.Darill
@Mr Grieves, sorry about that, I meant CouchDB. Though I am not sure about the specifics of CouchDB's implementation of replication, my answer still stands. You will have to give up strong consistency for better availability.Helgoland
@Helgoland Thanks. I understand that the model does not guarantee consistency across multiple nodes and I'm already learning to deal with this for cases where multiple users might be editing the same data. I was basically wondering how much work I need to do to when I start scaling horizontally. You've answered the question quite well. Thank you.Darill

© 2022 - 2024 — McMap. All rights reserved.