To directly answer your question, yes there is a way to accomplish this. As of Cassandra 2.1 you can create a secondary index on a collection. First, I'll re-create your column family definition (while adding a definition for upload_timestamp timeuuid
) and put some values in it.
aploetz@cqlsh:stackoverflow> SELECT * FROM videos ;
video_id | upload_timestamp | tags | title
--------------------------------------+--------------------------------------+-----------------------------------------------+---------------------------
2977b806-df76-4dd7-a57e-11d361e72ce1 | fc011080-64f9-11e4-a819-21b264d4c94d | ['sci-fi', 'action', 'adventure'] | Star Wars
ab696e1f-78c0-45e6-893f-430e88db7f46 | 8db7c4b0-64fa-11e4-a819-21b264d4c94d | ['documentary'] | The Witches of Whitewater
15e6bc0d-6195-4d8b-ad25-771966c780c8 | 1680d120-64fa-11e4-a819-21b264d4c94d | ['dark comedy', 'action', 'language warning'] | Pulp Fiction
(3 rows)
Next, I'll create a secondary index on the tags
column:
aploetz@cqlsh:stackoverflow> CREATE INDEX ON videos (tags);
Now, if I want to query the videos that contain the tag "action," I can accomplish this with the CONTAINS
keyword:
aploetz@cqlsh:stackoverflow> SELECT * FROM videos WHERE tags CONTAINS 'action';
video_id | upload_timestamp | tags | title
--------------------------------------+--------------------------------------+-----------------------------------------------+--------------
2977b806-df76-4dd7-a57e-11d361e72ce1 | fc011080-64f9-11e4-a819-21b264d4c94d | ['sci-fi', 'action', 'adventure'] | Star Wars
15e6bc0d-6195-4d8b-ad25-771966c780c8 | 1680d120-64fa-11e4-a819-21b264d4c94d | ['dark comedy', 'action', 'language warning'] | Pulp Fiction
(2 rows)
With this all being said, I should pass along a couple of warnings:
- Secondary indexes do not perform well at scale. They exist to provide convenience, not performance. If you are expecting to have to query by tag often, then the right way to solve this would be to create a
videosbytag
query table, with the same data but keyed like this: PRIMARY KEY (tag,video_id)
- You don't need the double-quotes in your table name. In fact, having it in quotes may cause you problems (ok, maybe minor irritations) down the road.