Aerospike: How Primary & Secondary Index works internally
Asked Answered
E

1

5

We are using Aerospike DB and was going through the documentation.
I could not find good explanation of algorithm explaining how Primary & Secondary index works.
The documentation says it uses some sort of distributed hash + B Tree.

Could someone please explain it.

Elzaelzevir answered 27/7, 2018 at 6:2 Comment(0)
D
7

The primary index is a mix of a distributed hash and distributed trees. It holds the metadata for every record in the Aerospike cluster.

Each namespace has 4096 partitions that are evenly distributed to the nodes of the cluster, by way of the partition map. Within the node, the primary index is an in-memory structure that indexes only the partitions assigned to the node.

The primary index has a hash table that leads to sprigs. Each sprig is a red-black tree that holds a portion of the metadata. The number of sprigs per-partition is configurable through partition-tree-sprigs.

Therefore, to find any record in the cluster, the client first uses the record's digest to find the correct node with one lookup against the partition map. Then, the node holding the master partition for the record will look up its metadata in the primary index. If this namespace stores data on SSD, the metadata includes the device, block ID and byte offset of the record, so it can be read with a single read operation. The records are stored contiguously, whether on disk or in memory.

The primary index is used for operations against a single record (identified by its key), or batch operations against multiple records (identified by a list of keys). It's also used by scans.

Secondary indexes are optional in-memory structures within each node of the cluster, that also only index the records of the partitions assigned to each node. They're used for query operations, which are intended to return many records based on a non-key predicate.

Because Aerospike is a distributed database, a query must go to all the nodes. The concurrency level (how many nodes are queried at a time) is controlled through a query policy in the client. Each node receiving the query has to lookup the criteria of the predicate against the appropriate secondary index. This returns zero to many records. At this point the optional predicate filter can be applied. The records found by secondary index query are then streamed back to the client. See the documentation on managing indexes.

Dhruv answered 27/7, 2018 at 14:27 Comment(4)
Also: #44075990Dhruv
Very nice and detailed answer. Love you : )Elzaelzevir
@RonenBotzer excuse me, can you explain why Aerospike is using B-tree for RAM-placed secondary indexes? As I know the main advantage of b-trees is when indexes are forces to be on HDDs (like in traditional databases). Why Aerospike doesnt use r-b trees in secondary index, for example?Nobukonoby
Not sure, didn’t write it myself. In the last user summit it was announced that secondary indexes are getting redone in 2019, so let’s see what they’re like then.Dhruv

© 2022 - 2024 — McMap. All rights reserved.