Cassandra and Secondary-Indexes, how do they work internally?
Asked Answered
M

1

10

How does a Cassandra Secondary-Index work internally? The docs state it is some kind of Hash Index:

Given i have the colum username="foobar" (Column username will be scondary index) in a CF User with RandomOrderingPartitioner

  1. Is my asumption correct, that cassandra uses a "Distributed Hash Index" (=so the index is not on one single node=the index is splitted)?
  2. On how many nodes are the index-parts held (the same amout as the replicatio factor)?
  3. On which nodes are the index-parts held (does Cassandra split the index by the same logic as the key with RandomOrderingPartitioner)?

  4. In case the index is hold on only one node (and of course replicated), how does cassandra "determin" the node that is responsible for the index (By hashing the columname and then using the randompartitioner logik to determine the node)?

  5. Is it really true, that this index is optimized for low cardinality? If yes, what is a rough estimate ( is there a concrete figure that i can use to judge), that I should not use a secondary index (and rather use a seperate CF for the index)? Or said differently how to calculate the cardinality and make the right decision?

I am trying to understand this.

Molokai answered 20/6, 2011 at 22:12 Comment(2)
What's the "why?" behind this question? Is there a specific problem you're trying to solve, or are you just trying to fill some gaps in your understanding?Resident
The underlaying question is in regard to "performance". Managing the indexes "Manually" as new CFs is very tedious. Secondary Indexes are easy to maintain, but (as the Cardinality Problem indicates) seem to not fit important needs. Furthermore I could "offload" some of the indexing stuff to other layers in my app. And therefore I want to understand a little bit how the secondary indexes work internally, what are their PROs and CONs and how they realte to performance.Molokai
A
11

Secondary indexes are basically just another column family. They are not directly accessible to users, but you can see statistics via the JMX bean: org.apache.cassandra.db.IndexedColumnFamilies

You can consult the statistics here to gauge the effectiveness of your index as you would a normal column family.

For more details see these previous posts:

How are Cassandra's 0.7 Secondary Indexes stored?

How scalable are automatic secondary indexes in Cassandra 0.7?

And since you have a hector tag, here is a link to the test case for IndexedSlicesQuery: https://github.com/rantav/hector/blob/master/core/src/test/java/me/prettyprint/cassandra/model/IndexedSlicesQueryTest.java

Aquarist answered 20/6, 2011 at 22:52 Comment(1)
i have followed this example very closely and i keep getting the dreaded: Caused by: InvalidRequestException(why:No indexed columns present in index clause with operator EQ)Idalia

© 2022 - 2024 — McMap. All rights reserved.