python cql driver - cassandra.ReadTimeout - "Operation timed out - received only 1 responses."
Asked Answered
B

1

7

I am using Cassandra 2.0 with python CQL.

I have created a column family as follows:

CREATE KEYSPACE IF NOT EXISTS Identification
  WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
  'DC1' : 1 };

USE Identification;

CREATE TABLE IF NOT EXISTS entitylookup (
  name varchar,
  value varchar,
  entity_id uuid,
  PRIMARY KEY ((name, value), entity_id))
WITH
    caching=all
;

I then try to count the number of records in this CF as follows:

#!/usr/bin/env python
import argparse
import sys
import traceback
from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

def count(host, cf):    
    keyspace = "identification"
    cluster = Cluster([host], port=9042, control_connection_timeout=600000000)
    session = cluster.connect(keyspace)
    session.default_timeout=600000000

    st = SimpleStatement("SELECT count(*) FROM %s" % cf, consistency_level=ConsistencyLevel.ALL)
    for row in session.execute(st, timeout=600000000):
        print "count for cf %s = %s " % (cf, str(row))
    dump_pool.close()
    dump_pool.join()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-cf", "--column-family", default="entitylookup", help="Column Family to query")
    parser.add_argument("-H", "--host", default="localhost", help="Cassandra host")    
    args = parser.parse_args()

    count(args.host, args.column_family)

    print "fim"

The count is not that useful to me, it's just a test with an operation that takes long to complete.

Although I have defined timeout as 600000000 seconds, after less than 30 seconds I get the following error:

./count_entity_lookup.py  -H localhost -cf entitylookup 
    Traceback (most recent call last):
      File "./count_entity_lookup.py", line 27, in <module>
        count(args.host, args.column_family)
      File "./count_entity_lookup.py", line 16, in count
        for row in session.execute(st, timeout=None):
      File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 1026, in execute
        result = future.result(timeout)
      File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 2300, in result
        raise self._final_exception
    cassandra.ReadTimeout: code=1200 [Timeout during read request] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'data_retrieved': True, 'required_responses': 2, 'consistency': 5}

It seems the answer was found in just a replica, but this really doesn't make sense to me. Should't cassandra be able to query it anyway?

In the image bellow, it's possible to see that the amount of requests to the cluster was really low and the latency low as well. I am not sure why is this happening.

enter image description here

Blackthorn answered 30/5, 2014 at 19:0 Comment(5)
How many nodes to have running in this cluster? From your description it sounds like just a single node, so it's not clear why the read operation would be expecting 2 responses. If you had a 2-node cluster, only one of which was online, these results would be expected.Hamitosemitic
I have two nodes in this cluster, RF=2, write and read consistency level are ALL - both nodes are onlineBlackthorn
Did you ever find a solution to this?Wigan
About the timeout, I found out that changing timeout on cassandra server file it would be effective. Client timeout can be specified, but it doesn't override the configuration in the server.Blackthorn
Regarding the slowness itself, it had to do with the size of requests to Cassandra. The data stored in the column families was too big, which was causing latency.Blackthorn
P
1

From the response:

received_responses': 1, 'data_retrieved': True, 'required_responses': 2

Data was only available on one node while the query is requiring consistency==all. Cassandra was not able to fulfill that request and timed out.

You may change the write consistency to 'ALL' if it is required that all nodes have the data.

That would ensure all read requests can be satisfied without consistency==ALL as that would be satisfied by the write request it self, though writes may fail if a node is off line.

See documentation for explanation of what each consistency level means.

LOCAL_QUORUM is what would be used to ensure majority of nodes with respect to replication factor are contacted within a DC.

Pyroclastic answered 1/9, 2016 at 10:14 Comment(6)
why was required_responses 2, if replication factor was 1 for the DC?Blackthorn
ALL means all :) Replication factor does not matter when consistency is set to all, meaning all nodes need to be contacted. Perhaps you meant to use QUORUM to contact majority of nodes with respect to replication factor.Pyroclastic
All nodes that contain the data - aka replication factor. If you have 1000 nodes in your cluster with replication factor 3, 3 nodes should be contacted for consistency ALL, not 1000.Blackthorn
All nodes in all DCs, yes. Is the other node in another DC and has the key? all would catch it.Pyroclastic
I had just 1 DCBlackthorn
According to Cassandra, there is another node in the cluster which has the data. Query requires ALL, one node could not respond, query failed. Where that node is, do not know.Pyroclastic

© 2022 - 2024 — McMap. All rights reserved.