I want to experiment with using Cassandra as an event store in an event sourcing application. My requirements for an event store are quite simple. The event 'schema' would be something like this:
- id: the id of an aggregate root entity
- data: the serialized event data (e.g. JSON)
- timestamp: when the event occurred
- sequence_number: the unique version of the event
I am completely new to Cassandra so forgive me for my ignorance in what I'm about to write. I only have two queries that I'd ever want to run on this data.
- Give me all events for a given aggregate root id
- Give me all events for a given aggregate root if where sequence number is > x
My idea is to create a Cassandra table in CQL like this:
CREATE TABLE events (
id uuid,
seq_num int,
data text,
timestamp timestamp,
PRIMARY KEY (id, seq_num) );
Does this seem like a sensible way to model the problem? And, importantly, does using a compound primary key allow me to efficiently perform the queries I specified? Remember that, given the use case, there could be a large number of events (with a different seq_num) for the same aggregate root id.
My specific concern is that the second query is going to be inefficient in some way (I'm thinking about secondary indexes here...)