I am using the datastax java driver for Apache Cassandra (v. 2.1.9) and I am wondering what should happen when I set replication_factor greater than number of nodes. I've read somewhere that Cassandra allows for this operation, but should fail when I will try to save some data (of course it depends on the write consistency level, but I mean the case of ALL).
The problem is that everything works, no exception is being thrown, even if I try to save data. Why?
Maybe the pieces of information which I've read were old, for older versions of Cassandra?
One more question, whether it's true, than what would happen when I add another node to the cluster?
Cassandra has a concept of "tunable consistency" which in part means you can control the consistency level setting for read/write operations.
You can read a bit more in the docs explaining consistency levels and how to set them in the cqlsh shell.
To learn more I suggest experimenting with the cqlsh on a single-node of Cassandra. For example we can create a keyspace with replication factor of 2 and load some data into it:
cqlsh> create keyspace test with replication = {'class': 'SimpleStrategy', 'replication_factor':2};
cqlsh> create table test.keys (key int primary key, val int);
cqlsh> insert into test.keys (key, val) values (1, 1);
cqlsh> select * from test.keys;
key | val
-----+-----
1 | 1
Everything works fine because the default consistency level is ONE, so only 1 node had to be online. Now try the same but setting it to ALL:
cqlsh> CONSISTENCY ALL;
Consistency level set to ALL.
cqlsh> insert into test.keys (key, val) values (2, 2);
Traceback (most recent call last):
File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
result = future.result()
File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}
cqlsh> select * from test.keys;
Traceback (most recent call last):
File "resources/cassandra/bin/cqlsh.py", line 1324, in perform_simple_statement
result = future.result()
File "resources/cassandra/bin/../lib/cassandra-driver.zip/cassandra-driver/cassandra/cluster.py", line 3133, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 2, 'alive_replicas': 1, 'consistency': 'ALL'}
Neither reads nor writes will work because the 2nd node doesn't exist. In fact the error message will give a helpful clue that two replicas were needed but only one was available.
Once you have an understanding using cqlsh, you can apply the same using the Java drivers, depending on what your application needs.
The reason you shouldn't set this as higher value than the number of nodes as Cassandra would achieve higher consistency when write replica and read replica count is greater than replication factor.
For instance if you have 5 nodes, and you have set the replication factor to 5. Now if 1 node goes down, you won't have high consistency due to which you have lost the advantage of Cassandra's availability.
After you add the nodes you could possibly increase the factor intelligently as the consistency level never allows you to write more than the number of nodes specified by the replication factor.
I think the answer is in this document about How data is distributed accross a cluster.
The easiest case with adding new nodes is with vnode. When you add a new node, it will be assigned some of the vnodes (token range) which used to belong to other node. And everything will keep working fine.
© 2022 - 2024 — McMap. All rights reserved.