I am using Cassandra 2.0 with python CQL.
I have created a column family as follows:
CREATE KEYSPACE IF NOT EXISTS Identification
WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
'DC1' : 1 };
USE Identification;
CREATE TABLE IF NOT EXISTS entitylookup (
name varchar,
value varchar,
entity_id uuid,
PRIMARY KEY ((name, value), entity_id))
WITH
caching=all
;
I then try to count the number of records in this CF as follows:
#!/usr/bin/env python
import argparse
import sys
import traceback
from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
def count(host, cf):
keyspace = "identification"
cluster = Cluster([host], port=9042, control_connection_timeout=600000000)
session = cluster.connect(keyspace)
session.default_timeout=600000000
st = SimpleStatement("SELECT count(*) FROM %s" % cf, consistency_level=ConsistencyLevel.ALL)
for row in session.execute(st, timeout=600000000):
print "count for cf %s = %s " % (cf, str(row))
dump_pool.close()
dump_pool.join()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-cf", "--column-family", default="entitylookup", help="Column Family to query")
parser.add_argument("-H", "--host", default="localhost", help="Cassandra host")
args = parser.parse_args()
count(args.host, args.column_family)
print "fim"
The count is not that useful to me, it's just a test with an operation that takes long to complete.
Although I have defined timeout as 600000000 seconds, after less than 30 seconds I get the following error:
./count_entity_lookup.py -H localhost -cf entitylookup
Traceback (most recent call last):
File "./count_entity_lookup.py", line 27, in <module>
count(args.host, args.column_family)
File "./count_entity_lookup.py", line 16, in count
for row in session.execute(st, timeout=None):
File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 1026, in execute
result = future.result(timeout)
File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 2300, in result
raise self._final_exception
cassandra.ReadTimeout: code=1200 [Timeout during read request] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'data_retrieved': True, 'required_responses': 2, 'consistency': 5}
It seems the answer was found in just a replica, but this really doesn't make sense to me. Should't cassandra be able to query it anyway?
In the image bellow, it's possible to see that the amount of requests to the cluster was really low and the latency low as well. I am not sure why is this happening.