create index creates a secondary index for the table. In cassandra, data is stored in partitions across nodes - one partition corresponds to one partition key - which is the first key of the primary key. Remaining keys in the primary key constitute the clustering keys. For example, if you had the following:
CREATE TABLE foo.people (
id int,
name text,
age int,
job text,
PRIMARY KEY (id, name, job)
)
id would be the partition key, and name and job would be the clustering keys.
Data in a partition is stored in order of the clustering keys. When querying with filters, you specify a partition key, and then you can filter down based on clustering keys. For multiple clustering keys, you must specify previous clustering in order to use a particular one. For example, in the mentioned scenario, you can do
where id = 2 and name = 'john' and job = 'dev' or
where id = 2 and name = 'john'
but not
where id = 2 and job = 'dev'
as name appears before job in the clustering key.
You can't do a filter on age as it's not part of a key. This is where the secondary index comes in. If you then do:
create index blah on people(age)
you will be allowed to do this:
select * from people where age = 45;
This can potentially be expensive as it will query across your cluster. The following though, can be efficient:
select * from people where id=2 and age = 45;
This is useful for time series or other wide row formats.
Queries on secondary indices are restrictive - you can't do range queries for example - you're limited to = checks.
Secondary indices in cassandra can save you the hassle of maintaining index tables yourself, and are more efficient than if you'd done so manually. They are eventually consistent (your writes won't wait for indices to be updated to return success) and currently, index info for a node's data is stored locally.
Lastly, you can find the indexes currently in place from the "IndexInfo" table in the system keyspace.
Hope that helps.