Faster way of counting total number of columns in a cassandra row with hector
Asked Answered
F

1

6

I want to count the total number of columns for a Cassandra row using Hector client. Currently I am doing this with a CountQuery, but it seems really slow to me. Also for a row, with just 60k columns it's taking nearly 2 seconds. My code currently looks like this:

QueryResult<Integer> qr = HFactory.createCountQuery(ksp, se, se).
    setColumnFamily("ColumnFamily1").
    setKey("RowKey").
    setRange(null, null, 1000000000).execute();

PS: I have to set the range to such a high number, otherwise it only counts me max. to the number I've provided in the range.

Any ideas how I can improve this?

Flor answered 3/1, 2012 at 15:12 Comment(0)
P
8

Counting columns in Cassandra is inherently slow. Cassandra has to iterate over the whole row in order to return the count.

You probably want to denormalize the count. You could use a counter column which you update every time you insert.

Popup answered 3/1, 2012 at 17:12 Comment(3)
thanks. I didn't know that it needs to iterate over the whole row.Flor
Has this changed in last 4 years ? I mean does now cassandra keeps some metadata and can quickly return number of columns it still iterates over all columns ?Crimpy
No this hasn't changed. The main reason is keeping track of that information would slow down the write path.Popup

© 2022 - 2024 — McMap. All rights reserved.