Recommended Cassandra Schema to store application logs

id text, // uuid phone_number text, // recipient of the SMS message text, // Message sent status boolean, // if the SMS was sent or not response text, // Request response service_provider text, // e.g Twilio, Telnyx, Venmo etc date timestamp, // Time SMS is sent

create table sms_logs_by_id ( id text, phone_number text, message text, status boolean, response text, provider text, service_provider text, date timestamp, primary key (id, date) ) with clustering order by (date DESC); create table sms_logs_by_service_provider ( id text, phone_number text, message text, status boolean, response text, provider text, service_provider text, date timestamp, primary key (service_provider, date) ) with clustering order by (date DESC); create table sms_logs_by_phone_number ( id text, phone_number text, message text, status boolean, response text, provider text, service_provider text, date timestamp, primary key (phone_number, date) ) with clustering order by (date DESC); create table sms_logs_by_status ( id text, phone_number text, message text, status boolean, response text, provider text, service_provider text, date timestamp, primary key (status, date) ) with clustering order by (date DESC);

The only potential issue I see is that for the last 3 tables (logs by status, phone number and provider), is that the partitions will get larger over time. It's important to remember that Cassandra has a mathematical limit of 2 billion cells per partition (where a "cell" == a column value or key). But you want to model you data so that you don't get anywhere near that limit, because your table will start getting slow long before that.

For these three, I'd recommend a "bucketing" approach:

sms_logs_by_service_provider
...
primary key (service_provider, date)

For this one, my other concern is the fact that you're tracking 3 service providers. So in addition to the partitions growing with each message, there's only 3 partitions. So data isn't distributing very well. With thousands of messages sent per day, I'm thinking that your "bucket" is going to need to be fairly precise...probably by "day." Maybe you could get away with a "week_bucket," but I'll use day for this example:

id                  text,
provider            text,
service_provider    text,
day_bucket.         int,
date                timestamp,
PRIMARY KEY ((service_provider, day_bucket), date, id)

This way, you're creating a partition for each combination of service_provider and day. That will give you plenty of data distribution, plus your partitions won't grow beyond the activity which happens in a single day. I've kept date as a DESC clustering key (good idea) but added id as a "tie-breaker," just in case two messages have the exact same timestamp.

create table sms_logs_by_phone_number (
...
primary key (phone_number, date)

So for this one, I'd take a similar approach. But as we're talking about individual users, we can use a much larger bucket. Based on a quick Google search, the average person sends 85 text messages per day, which comes out to 31,025 per year. That's probably ok to store by year.

id                  text,
phone_number        text,
year_bucket         int,
date                timestamp,
PRIMARY KEY ((phone_number, year_bucket), date, id)

Partitioning by phone_number already gives you some good distribution. Adding year_bucket in there will ensure that the partition won't have unbound growth. Also, id for a tie-breaker.

create table sms_logs_by_status(
...
primary key (status, date)

Logs by status is going to have a similar problem to the "provider" table, in that you probably only have a few statuses, so the data distribution will be limited. For this one, you're probably also going to want to use a bucket that's small, like by day.

id                  text,
status              text,
day_bucket.         int,
date                timestamp,
PRIMARY KEY ((status, day_bucket), date, id)

The unfortunate part, is these changes this complicate your query patterns. But they're necessary to save you from problems down the road.

Recommended topics

Hot tags