I have the following table defined in Cassandra 2.0.9:
CREATE TABLE history
(
histid uuid,
ddate text, -- Day Date, i.e. 2014-11-20
valtime timestamp, -- value time
val text, --value
PRIMARY KEY ((histid , ddate), valtime )
)
WITH CLUSTERING ORDER BY (valtime desc)
;
Scripts insert several thousand rows into this table daily.
I need to be able to select from this table knowing only the histid. However, i've partitioned the rows using (histid , ddate). Meaning, I have a full day of history values per row.
In order to select from this table for a particular histid, I also need to provide the ddate column. For example:
SELECT * FROM history
WHERE histid= cebc4c80-daa6-11e3-bcc2-005056a975a4
AND ddate = '2014-05-16'
;
To get the most recent value, I can do the following:
SELECT * FROM history
WHERE histid= cebc4c80-daa6-11e3-bcc2-005056a975a4
AND ddate = '2014-05-16'
LIMIT 1
;
However, If i want the most recent value for any given histid, I can't submit the query without knowing what ddate is, since it is part of the partition key.
So...I ask, what would be the best way to approach this?
This is what i've done, but i don;'t know if it's reasonable:
I've created a secondary table:
CREATE TABLE history_date
(
histid uuid,
maxdate timestamp, -- most recent date
PRIMARY KEY (histid)
);
When a row is inserted into the history table, a row is also inserted into this table using, (histid, valtime).
Our program code can then:
1. query the history_date table for a particular id
2. take the "maxdate" column (truncate it to yyyy-mm-dd)
3. use the histid and truncated maxdate to query the history table to retrieve the most recent value.
So this works. But, it doesn't really feel like a good solution.
Is there a better way to do this, perhaps with just a single table?
Thanks for your time.