Querying Cassandra by a partial partition key
Asked Answered
P

2

14

In Cassandra, I can create a composite partition key, separate from my clustering key:

CREATE TABLE footable (
    column1 text,
    column2 text,
    column3 text,
    column4 text,
    PRIMARY KEY ((column1, column2))
)

As I understand it, quering by partition key is an extremely efficient (the most efficient?) method for retrieving data. What I don't know, however, is whether it's also efficient to query by only part of a composite partition key.

In MSSQL, this would be efficient, as long as components are included starting with the first (column1 instead of column2, in this example). Is this also the case in Cassandra? Is it highly efficient to query for rows based only on column1, here?

Persuasive answered 3/12, 2014 at 16:41 Comment(3)
If you want to query on just part of the primary key, you could define it as PRIMARY KEY (column1, column2). However, this means that the partitions (where the data is stored) are only determined by column1. This may result in hot nodes, or other issues depending on the cardinality of column1. If you provide more details on your schema (particularly what column1 and column2 represent) we may be able to suggest an effective middle ground for you.Rivy
I'm looking for more of a general strategy here, not a particular recommendation. My actual problem, however, is not that I'd end up with hotspots, it's that I'm likely to exceed the ~2bn column limit at some point, because I'm also using a clustering key (a timestamp).Persuasive
@Rivy also, to be totally clear, I was asking about querying based on part of the partition key, not part of the primary key. It is possible, in a way, using the IN clause on the last part of the partition key. This is sufficient for my use case.Persuasive
K
19

This is not the case in Cassandra, because it is not possible. Doing so will yield the following error:

Partition key part entity must be restricted since preceding part is

Check out this Cassandra 2014 SF Summit presentation from DataStax MVP Robbie Strickland titled "CQL Under the Hood." Slides 62-64 show that the complete partition key is used as the rowkey. With composite partitioning keys in Cassandra, you must query by all of the rowkey or none of it.

You can watch the complete presentation video here.

Krishna answered 3/12, 2014 at 17:0 Comment(0)
P
12

This is impossible in Cassandra because it would require a full table scan to resolve such a query. The location of the partition is defined by a hash of all members of the composite key, this means giving only half of the key is as good as giving none of it. The only way to find the record is to search through all keys and check if they match.

Photoconduction answered 3/12, 2014 at 17:1 Comment(1)
"giving only half of the key is as good as giving none of it" - nicely put!Krishna

© 2022 - 2024 — McMap. All rights reserved.