Inserting arbitrary columns in Cassandra using CQL3
Asked Answered
P

2

10

Prior to CQL3 one could insert arbitrary columns such as columns that are named by a date:

cqlsh:test>CREATE TABLE seen_ships (day text PRIMARY KEY)
                WITH comparator=timestamp AND default_validation=text;
cqlsh:test>INSERT INTO seen_ships (day, '2013-02-02 00:08:22')
                VALUES ('Tuesday', 'Sunrise');

Per this post It seems that things are different in CQL3. Is it still somehow possible to insert arbitrary columns? Here's my failed attempt:

cqlsh:test>CREATE TABLE seen_ships (
    day text,
    time_seen timestamp,
    shipname text,
    PRIMARY KEY (day, time_seen)
);

cqlsh:test>INSERT INTO seen_ships (day, 'foo') VALUES ('Tuesday', 'bar');

Here I get Bad Request: line 1:29 no viable alternative at input 'foo'

So I try a slightly different table because maybe this is a limitation of compound keys:

cqlsh:test>CREATE TABLE seen_ships ( day text PRIMARY KEY );
cqlsh:test>INSERT INTO seen_ships (day, 'foo') VALUES ('Tuesday', 'bar');

Again with the Bad Request: line 1:29 no viable alternative at input 'foo'

What am I missing here?

Prefatory answered 3/7, 2013 at 3:29 Comment(0)
H
17

There's a good blog post over on the Datastax blog about this: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows

The answer is that yes, CQL3 supports dynamic colums, just not the way it worked in earlier versions of CQL. I don't really understand your example, you mix datestamps with strings in a way I don't see how it worked in CQL2 either. If I understand you correctly you want to make a timeline of ship sightings, where the partition key (row key) is the day and each sighting is a time/name pair. Here's a suggestion:

CREATE TABLE ship_sightings (
  day TEXT,
  time TIMESTAMP,
  ship TEXT,
  PRIMARY KEY (day, time)
)

And you insert entries with

INSERT INTO ship_sightings (day, time, ship) VALUES ('Tuesday', NOW(), 'Titanic')

however, you should probably use a TIMEUUID instead of TIMESTAMP (and the primary key could be a DATE), since otherwise you might add two sightings with the same timestamp and only one will survive.

This was an example of wide rows, but then there's the issue of dynamic columns, which isn't exactly the same thing. Here's an example of that in CQL3:

CREATE TABLE ship_sightings_with_properties (
  day TEXT,
  time TIMEUUID,
  ship TEXT,
  property TEXT,
  value TEXT,
  PRIMARY KEY (day, time, ship, property)
)

which you can insert into like this:

INSERT INTO ship_sightings_with_properties (day, time, ship, property, value)
VALUES ('Sunday', NOW(), 'Titanic', 'Color', 'Black')
# you need to repeat the INSERT INTO for each statement, multiple VALUES isn't
# supported, but I've not included them here to make this example shorter
VALUES ('Sunday', NOW(), 'Titanic', 'Captain', 'Edward John Smith')
VALUES ('Sunday', NOW(), 'Titanic', 'Status', 'Steaming on')
VALUES ('Monday', NOW(), 'Carapathia', 'Status', 'Saving the passengers off the Titanic')

The downside with this kind of dynamic columns is that the property names will be stored multiple times (so if you have a thousand sightings in a row and each has a property called "Captain", that string is saved a thousand times). On-disk compression takes away most of that overhead, and most of the time it's nothing to worry about.

Finally a note about collections in CQL3. They're a useful feature, but they are not a way to implement wide rows or dynamic columns. First of all they have a limit of 65536 items, but Cassandra can't enforce this limit, so if you add too many elements you might not be able to read them back later. Collections are mostly for small multi-values fields -- the canonical example is an address book where each row is an entry and where entries only have a single name, but multiple phone numbers, email addresses, etc.

Haggadist answered 3/7, 2013 at 18:33 Comment(2)
Won't multiple calls to "NOW()" create different values? Hence the keys for each Titanic item will be different?Exacting
In this case we want the values to be different, they're IDs for different events. However, IIRC Cassandra, and most databases will evaluate the value of NOW() once for each statement (which doesn't mean that the IDs will be the same, NOW() returns a TIMEUUID, which is unique but also contains a timestamp, so NOW() and NOW() in the same statement will create two distinct TIMEUUID values that have identical time components).Haggadist
B
0

It is not truly dynamic column, but most times you can get away with collections. Using Map column you might store some dynamic data

Bashan answered 28/2, 2015 at 7:59 Comment(1)
-1 How is it not "truly dynamic" column? No explanation provided. And collection is a poor replacement for dynamic data that has big range.Carver

© 2022 - 2024 — McMap. All rights reserved.