RPC timeout error while exporting data from CQL

Asked 18/9, 2013 at 12:35 Answered 31/5, 2015 at 7:38

I am trying to export data from cassandra using CQL client. A column family has about 100000 rows in it. when i am copying dta into csv file using COPY TO command i get following rpc_time out error.

copy mycolfamily to '/root/mycolfamily.csv'
Request did not complete within rpc_timeout.

I am running in:

[cqlsh 3.1.6 | Cassandra 1.2.8 | CQL spec 3.0.0 | Thrift protocol 19.36.0]

How can I increase RPC timeout limit?

I tried adding rpc_timeout_in_ms: 20000 (defalut is 10000) in my conf/cassandra.yaml file. but while restarting cassandra I get:

[root@user ~]# null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=rpc_timeout_in_ms for JavaBean=org.apache.cassandra.config.Config@71bfc4fc; Unable to find property 'rpc_timeout_in_ms' on class: org.apache.cassandra.config.Config
Invalid yaml; unable to start server.  See log for stacktrace.

Consentaneous answered 18/9, 2013 at 12:35 Comment(1)

Any help. I know that is too short time to expect answer in 1 hour span.. But it s vry important for me to continue further on what i am doing. Thanks – Consentaneous 18/9, 2013 at 13:49

The COPY command currently does the same thing with SELECT with LIMIT 99999999. So, it will eventually goes to timeout while your data is growing. Here's the export function;

https://github.com/apache/cassandra/blob/trunk/bin/cqlsh#L1524

I'm doing the same export on production. What I'm doing is the following;

make select * from table where timeuuid = someTimeuuid limit 10000
write the result set to a csv file w/ >> mode
make the next selects with respect to the last timeuuid

You can pipe command in cqlsh by the following cqlsh command

echo "{$cql}" | /usr/bin/cqlsh -u user -p password localhost 9160 > file.csv

Quijano answered 18/9, 2013 at 14:10 Comment(5)

It means that even you raise the rpc timeout, copy command will eventually goes to timeout. So, you should make select with limit 10000, write to a csv file, and the select the next 10000, etc.. – Quijano 18/9, 2013 at 15:17

i got the logic. Can you please tell me how can i write the output of select statement to a file.. I know its silly wish – Consentaneous 18/9, 2013 at 15:42

@venuktan I didn't check if that works but it should. just don't forget to add "{$cql}" to escape cql – Quijano 3/12, 2013 at 8:14

my question is , how would blob data be stored? COPY cmd stores data in text format. – Weakwilled 3/12, 2013 at 14:16

I'm not sure but I remember that if you export a blob, it'll export as [0-1] – Quijano 3/12, 2013 at 14:34

You can use Auto pagination by specifying fetch size in Datastax Java driver.

Statement stmt = new SimpleStatement("SELECT id FROM mycolfamily;"); 
stmt.setFetchSize(500); 
session.execute(stmt); 
for (Row r:result.all()){
    //write to file
}

Allpurpose answered 20/1, 2015 at 6:56 Comment(0)

I have encountered the same problem a few minutes ago then I have found CAPTURE and it worked:

First start capturing on cqlsh and then run your query with some limiting of your choice.

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/capture_r.html

Tops answered 11/2, 2015 at 18:3 Comment(0)

The best way yo export the data is using nodetool snapshot option. This returns immediately and can be restored later on. The only issue is that this export is per node and for the entire cluster.

Example: nodetool -h localhost -p 7199 snapshot

See reference: http://docs.datastax.com/en/archived/cassandra/1.1/docs/backup_restore.html#taking-a-snapshot

Antiphonary answered 31/5, 2015 at 7:38 Comment(0)

Recommended topics

Hot tags