Cassandra CQL searching for element in list

To directly answer your question, yes there is a way to accomplish this. As of Cassandra 2.1 you can create a secondary index on a collection. First, I'll re-create your column family definition (while adding a definition for upload_timestamp timeuuid) and put some values in it.

aploetz@cqlsh:stackoverflow> SELECT * FROM videos ;

 video_id                             | upload_timestamp                     | tags                                          | title
--------------------------------------+--------------------------------------+-----------------------------------------------+---------------------------
 2977b806-df76-4dd7-a57e-11d361e72ce1 | fc011080-64f9-11e4-a819-21b264d4c94d |             ['sci-fi', 'action', 'adventure'] |                 Star Wars
 ab696e1f-78c0-45e6-893f-430e88db7f46 | 8db7c4b0-64fa-11e4-a819-21b264d4c94d |                               ['documentary'] | The Witches of Whitewater
 15e6bc0d-6195-4d8b-ad25-771966c780c8 | 1680d120-64fa-11e4-a819-21b264d4c94d | ['dark comedy', 'action', 'language warning'] |              Pulp Fiction

(3 rows)

Next, I'll create a secondary index on the tags column:

aploetz@cqlsh:stackoverflow> CREATE INDEX ON videos (tags);

Now, if I want to query the videos that contain the tag "action," I can accomplish this with the CONTAINS keyword:

aploetz@cqlsh:stackoverflow> SELECT * FROM videos WHERE tags CONTAINS 'action';

 video_id                             | upload_timestamp                     | tags                                          | title
--------------------------------------+--------------------------------------+-----------------------------------------------+--------------
 2977b806-df76-4dd7-a57e-11d361e72ce1 | fc011080-64f9-11e4-a819-21b264d4c94d |             ['sci-fi', 'action', 'adventure'] |    Star Wars
 15e6bc0d-6195-4d8b-ad25-771966c780c8 | 1680d120-64fa-11e4-a819-21b264d4c94d | ['dark comedy', 'action', 'language warning'] | Pulp Fiction

(2 rows)

With this all being said, I should pass along a couple of warnings:

Secondary indexes do not perform well at scale. They exist to provide convenience, not performance. If you are expecting to have to query by tag often, then the right way to solve this would be to create a videosbytag query table, with the same data but keyed like this: PRIMARY KEY (tag,video_id)
You don't need the double-quotes in your table name. In fact, having it in quotes may cause you problems (ok, maybe minor irritations) down the road.

Recommended topics

Hot tags