modelling cassandra tables for upsert and select query

create table IF NOT EXISTS host_alerts( unique_key text, host_id text, occur_time timestamp, clear_time timestamp, last_occur timestamp, alarm_name text, primary key (unique_key,host_id,clear_time) );

truncate host_alerts; insert into host_alerts(unique_key,host_id,alarm_name, clear_time,occur_time,last_occur ) values('1','server-1','disk failure', '1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:01:00+0530'); insert into host_alerts(unique_key,host_id,alarm_name, clear_time,occur_time,last_occur ) values('1','server-1','disk failure', '1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530'); insert into host_alerts(unique_key,host_id,alarm_name, clear_time,occur_time,last_occur ) values('1','server-1','disk failure', '2015-07-01 00:02:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

//All alarms which are **not cleared** for host_id select * from host_alerts where host_id = 'server-1' and clear_time = '1970-01-01 00:00:00+0530'; //All alarms which are cleared for host_id select * from host_alerts where host_id = 'server-1' and clear_time > '2015-07-01 00:00:00+0530'; //All alarms between first occurrence select * from host_alerts where host_id = 'server-1' and occur_time > '2015-07-01 00:02:00+0530'and occur_time < '2015-07-01 00:05:00+0530';

I think you probably need three tables to support your three query types.

The first table would support time range queries about the history of when alerts happened for each host:

CREATE TABLE IF NOT EXISTS host_alerts_history (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, occur_time)
);

SELECT * FROM host_alerts_history WHERE host_id = 'server-1' AND occur_time > '2015-08-16 10:05:37-0400';

The second table would keep track of the uncleared alarms for each host:

CREATE TABLE IF NOT EXISTS host_uncleared_alarms (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, alarm_name)
);

SELECT * FROM host_uncleared_alarms WHERE host_id = 'server-1';

The last table would keep track of when alerts were cleared for each host:

CREATE TABLE IF NOT EXISTS host_alerts_by_cleartime (
    host_id text,
    clear_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, clear_time)
);

SELECT * FROM host_alerts_by_cleartime WHERE host_id = 'server-1' AND clear_time > '2015-08-16 10:05:37-0400';

When a new alarm event arrives, you'd execute this batch:

BEGIN BATCH
INSERT INTO host_alerts_history (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
INSERT INTO host_uncleared_alarms (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

Note that the insert into the uncleared table is an upsert, since the timestamp is not part of the key. So that table will only have one entry for each alarm name with a timestamp of the last occurrance.

When an alarm clear event arrives, you'd execute this batch:

BEGIN BATCH
DELETE FROM host_uncleared_alarms WHERE host_id = 'server-1' AND alarm_name = 'disk full';
INSERT INTO host_alerts_by_cleartime (host_id, clear_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

I didn't really understand what your "unique_key" is or where it comes from. I'm not sure it is needed since the combination of host_id and alarm_name should be the level of granularity you want to work with. Adding another unique key into the mix could give rise to a lot of unmatched alert/clear events. If unique_key is an alarm id, then use that as the key in place of alarm_name in my example and have alarm_name as a data column.

To prevent your tables from filling up over time with old data, you could use the TTL feature to automatically delete rows after several days.

Recommended topics

Hot tags