modelling cassandra tables for upsert and select query
Asked Answered
C

1

9

I have designed following table to store server alarms:

create table IF NOT EXISTS host_alerts(
    unique_key text,
    host_id text,
    occur_time timestamp,
    clear_time timestamp,
    last_occur timestamp,
    alarm_name text,
    primary key (unique_key,host_id,clear_time)
);

Let's enter some data:

truncate host_alerts;

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:01:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'2015-07-01 00:02:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

The query my application will be running are:

//All alarms which are **not cleared** for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time = '1970-01-01 00:00:00+0530';

//All alarms which are  cleared for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time > '2015-07-01 00:00:00+0530';

//All alarms between first occurrence
select * from host_alerts where  host_id = 'server-1' 
and occur_time > '2015-07-01 00:02:00+0530'and occur_time < '2015-07-01 00:05:00+0530';

I don't know if I should prepare more table example : host_alerts_by_hostname or host_alerts_by_cleartime and so on or simply add clustering index. As unique id is the only unique column but I need to retrive data from other column

Not cleared alarms: '1970-01-01 00:00:00+0530' cleared event has some date value.

host_id is server name

occur_time is when event has occured.

last_occur is time when event again reoccured.

alarm_name is what has happend with system.

How can i model my table so that I can perform these query and update based on unique_id? With what I have tried select is not possible and during upsert new row is created for same unique_key.

Continence answered 8/8, 2015 at 10:50 Comment(0)
S
6

I think you probably need three tables to support your three query types.

The first table would support time range queries about the history of when alerts happened for each host:

CREATE TABLE IF NOT EXISTS host_alerts_history (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, occur_time)
);

SELECT * FROM host_alerts_history WHERE host_id = 'server-1' AND occur_time > '2015-08-16 10:05:37-0400';

The second table would keep track of the uncleared alarms for each host:

CREATE TABLE IF NOT EXISTS host_uncleared_alarms (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, alarm_name)
);

SELECT * FROM host_uncleared_alarms WHERE host_id = 'server-1';

The last table would keep track of when alerts were cleared for each host:

CREATE TABLE IF NOT EXISTS host_alerts_by_cleartime (
    host_id text,
    clear_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, clear_time)
);

SELECT * FROM host_alerts_by_cleartime WHERE host_id = 'server-1' AND clear_time > '2015-08-16 10:05:37-0400';

When a new alarm event arrives, you'd execute this batch:

BEGIN BATCH
INSERT INTO host_alerts_history (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
INSERT INTO host_uncleared_alarms (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

Note that the insert into the uncleared table is an upsert, since the timestamp is not part of the key. So that table will only have one entry for each alarm name with a timestamp of the last occurrance.

When an alarm clear event arrives, you'd execute this batch:

BEGIN BATCH
DELETE FROM host_uncleared_alarms WHERE host_id = 'server-1' AND alarm_name = 'disk full';
INSERT INTO host_alerts_by_cleartime (host_id, clear_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

I didn't really understand what your "unique_key" is or where it comes from. I'm not sure it is needed since the combination of host_id and alarm_name should be the level of granularity you want to work with. Adding another unique key into the mix could give rise to a lot of unmatched alert/clear events. If unique_key is an alarm id, then use that as the key in place of alarm_name in my example and have alarm_name as a data column.

To prevent your tables from filling up over time with old data, you could use the TTL feature to automatically delete rows after several days.

Schmeltzer answered 16/8, 2015 at 14:20 Comment(4)
thanks for really nice answer unique_key is random key generated in rdbms. does cassandra have feature to automatically replicate data among tables? I need to check clear_time field each time, will it not slow down the performance? Also, third one I think you mean occur_time??Continence
How do I do this for 100-1000 alarms per sec?Continence
Cassandra 3.0 will have support for materialized views to propagate data from one table to another, but that release won't be available for a while. I don't understand what you mean about checking clear_time each time. You want to avoid doing a read before write in Cassandra since it will greatly reduce transaction throughput.Schmeltzer
Handling 1000 alarms per second should be no problem. You can do asynchronous operations with Cassandra and easily achieve that rate of throughput.Schmeltzer

© 2022 - 2024 — McMap. All rights reserved.