Cassandra replication factor greater than number of nodes

Asked 9/3, 2016 at 11:3 Answered 9/3, 2016 at 23:53

Solved java cassandra datastax-java-driver cassandra-2.1

I am using the datastax java driver for Apache Cassandra (v. 2.1.9) and I am wondering what should happen when I set replication_factor greater than number of nodes. I've read somewhere that Cassandra allows for this operation, but should fail when I will try to save some data (of course it depends on the write consistency level, but I mean the case of ALL).
The problem is that everything works, no exception is being thrown, even if I try to save data. Why?
Maybe the pieces of information which I've read were old, for older versions of Cassandra? One more question, whether it's true, than what would happen when I add another node to the cluster?

Segment answered 9/3, 2016 at 11:3 Comment(0)

Cassandra has a concept of "tunable consistency" which in part means you can control the consistency level setting for read/write operations.

You can read a bit more in the docs explaining consistency levels and how to set them in the cqlsh shell.

To learn more I suggest experimenting with the cqlsh on a single-node of Cassandra. For example we can create a keyspace with replication factor of 2 and load some data into it:

cqlsh> create keyspace test with replication = {'class': 'SimpleStrategy', 'replication_factor':2};
cqlsh> create table test.keys (key int primary key, val int);
cqlsh> insert into test.keys (key, val) values (1, 1);
cqlsh> select * from test.keys;

 key | val
-----+-----
   1 |   1

Everything works fine because the default consistency level is ONE, so only 1 node had to be online. Now try the same but setting it to ALL:

cqlsh> CONSISTENCY ALL;
Consistency level set to ALL.
cqlsh> insert into test.keys (key, val) values (2, 2);
Traceback (most recent call last):
  File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
    result = future.result()
  File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
    raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}

cqlsh> select * from test.keys;
Traceback (most recent call last):
  File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
    result = future.result()
  File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
    raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}

Neither reads nor writes will work because the 2nd node doesn't exist. In fact the error message will give a helpful clue that two replicas were needed but only one was available.

Once you have an understanding using cqlsh, you can apply the same using the Java drivers, depending on what your application needs.

Effluent answered 9/3, 2016 at 23:53 Comment(1)

My mistake, I defined string query, then created a statement for this query, set consistency level to ALL and finally... executed string query instead of statement :D Sorry for my questions, but my code wasn't right. Thanks for the answer. – Segment 10/3, 2016 at 12:43

The reason you shouldn't set this as higher value than the number of nodes as Cassandra would achieve higher consistency when write replica and read replica count is greater than replication factor.

For instance if you have 5 nodes, and you have set the replication factor to 5. Now if 1 node goes down, you won't have high consistency due to which you have lost the advantage of Cassandra's availability.

After you add the nodes you could possibly increase the factor intelligently as the consistency level never allows you to write more than the number of nodes specified by the replication factor.

Subcontraoctave answered 9/3, 2016 at 11:20 Comment(2)

Yes I know that I shouldn't set that value, It's not a production code, I've just written some unit tests to learn Cassandra and I believed that setting replication_factor greater than number of nodes fail, but it didn't. So I've started to search a reason of that, but I haven't found yet, so I asked here. – Segment 9/3, 2016 at 12:22

It wouldn't fail immediately. – Subcontraoctave 9/3, 2016 at 14:21

I think the answer is in this document about How data is distributed accross a cluster.

The easiest case with adding new nodes is with vnode. When you add a new node, it will be assigned some of the vnodes (token range) which used to belong to other node. And everything will keep working fine.

Isomorphism answered 9/3, 2016 at 12:59 Comment(0)

Recommended topics

Hot tags