Cassandra order and clustering key
Asked Answered
S

1

7

I have this table:

CREATE TABLE custumer_events_service.events_by_websiteId_time(
    "event_id" text,
    "currentTime" timestamp,
    "websiteId" varchar,

    OTHER COLUMNS ...

    PRIMARY KEY(event_id, websiteId, currentTime)
)

In this case, would I get 10000 rows ordered by currentime when I execute this query:

SELECT * FROM events_by_websiteid_time WHERE websiteid='xxxx' LIMIT 10000 ALLOW FILTERING;

Or did I have to add WITH CLUSTERING ORDER BY (currentTime DESC); at the end?

Strain answered 1/3, 2016 at 17:54 Comment(0)
S
9

Cassandra can only enforce a sort order within a partition. As you are using ALLOW FILTERING to avoid having to provide your partition key (event_id) your result set will be ordered by the hashed token values of each event_id, and then by websiteid and currentTime.

To get your results to be ordered by currentTime, you would need to either create a new query table or alter the PRIMARY KEY definition (and perhaps the CLUSTERING ORDER) of your existing table. If you decide to create a new query table, it would have to look something like this:

CREATE TABLE custumer_events_service.events_by_websiteId_time_eventid(
  event_id text,
  currentTime timestamp,
  websiteId varchar,

OTHER COLUMNS ...

  PRIMARY KEY (websiteid,currentTime,event_id))
WITH CLUSTERING ORDER BY (currentTime DESC, event_id ASC);

That would allow this query:

SELECT * FROM events_by_websiteid_time_eventid WHERE websiteid='xxxx' LIMIT 10000;

...to work as you expect.

Slovakia answered 1/3, 2016 at 18:18 Comment(5)
+1 Thanks for your answer Aron (once again ;-) ), what are the downsides of the first solution ( using CLUSTERING ORDER )? wiIth the second solution, I am not risking to disbalance my cluster as I have a non-equivalent distribution of events by websiteId?Strain
@Strain The downside of the first solution, is that involves duplicating your data into a new query table. But if you need to serve queries for keyed off of both websiteid and event_id, then you may need both tables...trading disk for performance. As for the imbalance in the 2nd solution, if it is a drastic difference, then maybe it would make sense to add an additional "bucket" to help partition your data like day or month, or something else that makes sense for your use case.Slovakia
As I am learning cassandra, I have posted a new question #35880749 can you take a look? Thanks in advanceStrain
@Slovakia What do you mean by "a query table"? Is it somehow different from "a table"?Basting
@IvanBalashov The term "query table" is used to describe a table which is is designed to hold the same data as an existing table, but with a different PRIMARY KEY definition. This allows it to support querying the same data with a different query. It is a functional term; query tables are technically the same as "normal" tables.Slovakia

© 2022 - 2024 — McMap. All rights reserved.