I need to store several billion small data structures (around 200 bytes each). So far, storing each element as a separate document is working well, with Mongo providing around 10,000 results per second. I'm using a 20-byte hash as the _id for each document, and a single index on the _id field. In testing, this is working for data sets with 5,000,000 documents.
In operation, we will be making around 10,000 requests per second, updating existing documents about 1,000 times per second, and inserting new documents maybe 100 times per second or less.
How can we manage larger data sets, when we cannot store an entire index in RAM? Will MongoDB perform better if we combine several elements into each document -- for a faster search through the index, but more data being returned in each query?
Unlike other questions on SO, I'm not only interested in how much data we can stuff into Mongo. It can clearly manage the amount of data we're looking at. My concern is how can we maximize the speed of find
operations on huge collections, given constrained RAM.
Our searches will tend to be clustered; around 50,000 elements will be satisfy about 50% of the queries, but the remaining 50% will be randomly distributed across all of the data. Can we expect a performance gain by moving those 50% into their own collection, in order to keep a smaller index of the most-used data always in ram?
Would reducing the size of the _id field from 20-bytes to 8-bytes have a significant impact on MnogoDB's indexing speed?