MongoDB Index Complexity
Asked Answered
E

1

13

I really like MongoDB, I use it at work and home, and not once yet have I hit a performance, complexity, or limitation issue with it. But I've been thinking about indexes a lot and I had a question I've not found an adequate answer to.

One of the big issues with SQL databases at scale is the relative complexity of queries. Specifically, MySQL uses b-trees for most of it's indexes, which querying takes O(log(n)), better than linear, but still means things take longer the more data you have.

A big attraction of noSQL databases is the removal/mitigation of this scaling issue, often relying instead on hash style indexes, which have O(1) lookup time, so having more data doesn't slow down your app. This is where my question comes in:

According to the offical MongoDB documentation, all indexes in Mongo use b-trees. Despite the fact that Mongo does in fact have a hashed index, as far as I can tell these are still stored in b-trees, same with the index on the _id field. I couldn't even find anything indicating anything about constant time anywhere in Mongo's documentation!

So my question is this: are, in fact, all indexes (including _id and hashed) in Mongo stored in b-trees? Does this mean querying for keys (even by _id) in fact takes O(log(n)) time?

Addendum: As a point of note, I'd be great if Mongo documentation provided some complexity formulas with examples queries. My favorite example of this is the Redis documentation.

Also: This is related. But I have the added specific questions regarding the hashed indexes and (more importantly) the _id index.

Eam answered 18/9, 2014 at 19:26 Comment(0)
T
19

If you look at the code for indexing in mongodb (here), you can easily see that it's using btree for indexing. So the order of the algorithm is O(log n), but the base of this logarithm function is not 2, but 8192 instead, which is here in the code.

So for a million records we only have two levels (assuming the tree is balanced) and that is why it can find the record so fast. Overall, it's true the order is logarithmic, but since the base of the logarithm function is so large, it grows slowly.

Tynes answered 18/9, 2014 at 20:3 Comment(3)
I had written out the big thing about thinking they were binary nodes with large bodies, under the assumption b-tree meant 'binary tree', then wikipedia informed me of the generalization. So thank you this answers my question! As an aside, it looks like the b-tree's in mongo are in fact self balancing.Eam
O(log n) is the same regardless of base since logarithms to different bases different by a constant factor. 8192 is a bucket size (number of bytes) and doesn't have to do with the computational complexity of the Btree algorithm, just with (important) constant factors having to do with disk i/o. It's not a general fact that NoSQL databases use hashes to achieve O(1) lookups- that's more a property of an in-memory key-value store. Absence/avoidance of joins is more characteristic of NoSQL as complex joins are a bottleneck for relational db's.Tinderbox
Is this answer still valid for newer mongodb like v6?Kennethkennett

© 2022 - 2024 — McMap. All rights reserved.