MongoDB - Is it possible to query by associative array key?

I need to store some data that is essentially just an array of key-value pairs of date/ints, where the dates will always be unique.

I'd like to be able to store it like an associative array:

array(
    "2012-02-26" => 5,
    "2012-02-27" => 2,
    "2012-02-28" => 17,
    "2012-02-29" => 4
)

but I also need to be able to query the dates (ie. get everything where date > 2012-02-27), and so suspect that I'll need to use a schema more like:

array(
    array("date"=>"2012-02-26", "value"=>5),
    array("date"=>"2012-02-27", "value"=>2),
    array("date"=>"2012-02-28", "value"=>17),
    array("date"=>"2012-02-29", "value"=>4),
)

Obviously the former is much cleaner and more concise, but will I be able to query it in the way that I am wanting, and if not are there any other schemas that may be more suitable?

You've described two methods, let me break them down.

Method #1 - Associative Array

The key tool for querying by "associative array" is the $exists operator. Here are details on the operator.

So you can definitely run a query like the following:

db.coll.find( { $exists: { 'field.2012-02-27' } } );

Based on your description you are looking for range queries which does not match up well with the $exists operator. The "associative array" version is also difficult to index.

Method #2 - Array of objects

This definitely has better querying functionality:

db.coll.find( { 'field.date': { $gt: '2012-02-27' } } );

It can also be indexed

db.coll.ensureIndex( { 'field.date': 1 } );

However, there is a trade-off on updating. If you want to increment the value for a specific date you have to use this unwieldy $ positional operator. This works for an array of objects, but it fails for anything with further nesting.

Other issues

One issue with either of these methods is the long-term growth of data. As you expand the object size it will take more space on disk and in memory. If you have an object with two years worth of data that entire array of 700 items will need to be in memory for you to update data for today. This may not be an issue for your specific data, but it should be considered.

In the same vein, MongoDB queries always return the top-level object. Again, if you have an array of 700 items, you will get all of them for each document that matches. There are ways to filter out the fields that are returned, but they don't work for "arrays of objects".

Recommended topics

Hot tags