MongoDB - Is it possible to query by associative array key?
Asked Answered
T

1

6

I need to store some data that is essentially just an array of key-value pairs of date/ints, where the dates will always be unique.

I'd like to be able to store it like an associative array:

array(
    "2012-02-26" => 5,
    "2012-02-27" => 2,
    "2012-02-28" => 17,
    "2012-02-29" => 4
)

but I also need to be able to query the dates (ie. get everything where date > 2012-02-27), and so suspect that I'll need to use a schema more like:

array(
    array("date"=>"2012-02-26", "value"=>5),
    array("date"=>"2012-02-27", "value"=>2),
    array("date"=>"2012-02-28", "value"=>17),
    array("date"=>"2012-02-29", "value"=>4),
)

Obviously the former is much cleaner and more concise, but will I be able to query it in the way that I am wanting, and if not are there any other schemas that may be more suitable?

Tenuous answered 29/2, 2012 at 20:49 Comment(2)
php.net/manual/en/mongo.queries.phpLewellen
I'd not store the dates in that format. Use time(); function. It will make it easier to sort dates and such. php.net/manual/en/function.time.phpLewellen
T
11

You've described two methods, let me break them down.

Method #1 - Associative Array

The key tool for querying by "associative array" is the $exists operator. Here are details on the operator.

So you can definitely run a query like the following:

db.coll.find( { $exists: { 'field.2012-02-27' } } );

Based on your description you are looking for range queries which does not match up well with the $exists operator. The "associative array" version is also difficult to index.

Method #2 - Array of objects

This definitely has better querying functionality:

db.coll.find( { 'field.date': { $gt: '2012-02-27' } } );

It can also be indexed

db.coll.ensureIndex( { 'field.date': 1 } );

However, there is a trade-off on updating. If you want to increment the value for a specific date you have to use this unwieldy $ positional operator. This works for an array of objects, but it fails for anything with further nesting.

Other issues

One issue with either of these methods is the long-term growth of data. As you expand the object size it will take more space on disk and in memory. If you have an object with two years worth of data that entire array of 700 items will need to be in memory for you to update data for today. This may not be an issue for your specific data, but it should be considered.

In the same vein, MongoDB queries always return the top-level object. Again, if you have an array of 700 items, you will get all of them for each document that matches. There are ways to filter out the fields that are returned, but they don't work for "arrays of objects".

Tympanic answered 1/3, 2012 at 1:25 Comment(4)
If you have an object with two years worth of data that entire array of 700 items will need to be in memory for you to update data for today - what about if I was to use $slice to remove a subset of the array and $push to append a new item to an array?Tenuous
So that works for returning the data, but only if you push in date order. However, on the server side, it still needs to pull the entire object into memory even if it only passes you a piece of it.Tympanic
Oh right I didn't realise that, I figured it was smart enough to remove only the specified subset. That's a bit of a dealbreaker for my current project, thanks for stopping me getting 6 months down the line and realising it then!Tenuous
Yeah, it's a side effect of the way the BSON format was created. It's very serial, so it needs to read blocks "front to back" in order to correctly de-serialize the object.Tympanic

© 2022 - 2024 — McMap. All rights reserved.