I've been working with Elasticsearch and this can be an answer to this problem... Assuming you are willing to use Elassandra instead of Cassandra.
The search system maintains many statistics and within seconds of the last updates it should have a good idea of how many rows you have in a table.
Here is a Match All Query request that gives you the information:
curl -XGET \
-H 'Content-Type: application/json' \
"http://127.0.0.1:9200/<search-keyspace>/_search/?pretty=true" \
-d '{ "size": 1, "query": { "match_all": {} } }'
Where the <search-keyspace>
is a keyspace that Elassandra creates. It generally is named something like <keyspace>_<table>
, so if you have a keyspace named foo
and a table named bar
in that keyspace, the URL will use .../foo_bar/...
. If you want to get the total number of rows in all your tables, then just use /_search/
.
The output is a JSON which looks like this:
{
"took" : 124,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 519659, <-- this is your number
"max_score" : 1.0,
"hits" : [
{
"_index" : "foo_bar",
"_type" : "content",
"_id" : "cda683e5-d5c7-4769-8e2c-d0a30eca1284",
"_score" : 1.0,
"_source" : {
"date" : "2018-12-29T00:06:27.710Z",
"key" : "cda683e5-d5c7-4769-8e2c-d0a30eca1284"
}
}
]
}
}
And in terms of speed, this takes milliseconds, whatever the number of rows. I have tables with many millions of rows and it works like a charm. No need to wait hours or anything like that.
As others have mentioned, Elassandra is still a system heavily used in parallel by many computers. The counters will change quickly if you have many updates all the time. So the numbers you get from Elasticsearch are correct only if you prevent further updates for long enough for the counters to settle. Otherwise it's always going to be an approximate result.