what leads to wide row in cassandra?
Asked Answered
B

1

1

I found the following from this post:

create table posts(username varchar, time timeuuid, post_text varchar, primary key(username, time))

There will only be as many CF rows as there are variations of the first element in your primary key. This can be a problem if this element has a very low cardinality as you can end up with very wide CF rows.

My point is:

what I have bolded above, shouldn't this be second element in primary key. That is, the secondary element or clustering element causes wide row correct?

Baboon answered 1/8, 2014 at 21:33 Comment(0)
I
5

It's a problem of definitions and of dictionary. A wide-row and a row are not the same thing -- giving a definition I would say that in a Table with the PK(partition, clustering) there will be as many wide rows as the number of partition keys. The number of rows is instead given by the sum of all clustering keys for each partition.

So in the sentence you quoted the author wrote "rows" but he meant "wide-rows".

There will only be as many CF wide-rows as there are variations of the first element in your primary key. This can be a problem if this element has a very low cardinality as you can end up with very wide CF rows.

Probably at the time of writing the term wide-row was not so used. So given such a table

CREATE TABLE wide_rows (
  partitionkey text,
  clusteringkey text,
  data text,
  PRIMARY KEY ((partitionkey), clusteringkey)
)

there will be only partitionkey wide-rows, but rows number depends on both partition and clustering

insert into wide_rows(partitionkey, clusteringkey, data) VALUES ( 'eagertoLearn', 'stackoverflow', 'cassandra question');
insert into wide_rows(partitionkey, clusteringkey, data) VALUES ( 'eagertoLearn', 'google groups', 'cql question');
insert into wide_rows(partitionkey, clusteringkey, data) VALUES ( 'eagertoLearn', 'askubuntu', 'linux shell question');
select * from wide_rows where partitionkey = 'eagertoLearn';

 partitionkey | clusteringkey | data
--------------+---------------+----------------------
 eagertoLearn |     askubuntu | linux shell question
 eagertoLearn | google groups |         cql question
 eagertoLearn | stackoverflow |   cassandra question

(3 rows)

CQL say that I've got 3 rows back, but these 3 rows belongs to the same partition key so this is 1 wide row.

HTH, Carlo

Illconditioned answered 2/8, 2014 at 6:32 Comment(3)
Thanks for the answer. what is meant by low cardinality and high cardinality as described aboveBaboon
By cardinality they mean the possible variations. Say you store comments, each comment has a vote from 1 to 10. If you choose vote as partition key you have a low cardinality key since you can have only 10 possible wide-row key. If you choose instead the user id you can have as many wide-rows as the user registered to the comment platformIllconditioned
I have posted a question on cardinality here. Please help. Thanks: #25101676Baboon

© 2022 - 2024 — McMap. All rights reserved.