MongoDB index not helping query with multikey index
Asked Answered
I

1

6

I have a collection of documents with a multikey index defined. However, the performance of the query is pretty poor for just 43K documents. Is ~215ms for this query considered poor? Did I define the index correctly if nscanned is 43902 (which equals the total documents in the collection)?

Document:

{
    "_id": {
        "$oid": "50f7c95b31e4920008dc75dc"
    },
    "bank_accounts": [
        {
            "bank_id": {
                "$oid": "50f7c95a31e4920009b5fc5d"
            },
            "account_id": [
                "ff39089358c1e7bcb880d093e70eafdd",
                "adaec507c755d6e6cf2984a5a897f1e2"
            ]
        }
    ],
    "created_date": "2013,01,17,09,50,19,274089",
}

Index:

{ "bank_accounts.bank_id" : 1 , "bank_accounts.account_id" : 1}

Query:

db.visitor.find({ "bank_accounts.account_id" : "ff39089358c1e7bcb880d093e70eafdd" , "bank_accounts.bank_id" : ObjectId("50f7c95a31e4920009b5fc5d")}).explain()

Explain:

{
    "cursor" : "BtreeCursor bank_accounts.bank_id_1_bank_accounts.account_id_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 43902,
    "nscanned" : 43902,
    "nscannedObjectsAllPlans" : 43902,
    "nscannedAllPlans" : 43902,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 213,
    "indexBounds" : {
        "bank_accounts.bank_id" : [
            [
                ObjectId("50f7c95a31e4920009b5fc5d"),
                ObjectId("50f7c95a31e4920009b5fc5d")
            ]
        ],
        "bank_accounts.account_id" : [
            [
                {
                    "$minElement" : 1
                },
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "Not_Important"
}
Identic answered 17/2, 2013 at 0:29 Comment(8)
One idea: Try flipping the order of parameters to the find to match the order specified by the index. Also, if you just do a find for bank_id, does that use the index? It should, given the order specified in ensureIndex.Proven
@WiredPrairie: Adding a second index matching the order the find parameters didn't help. And yes, if I just do a find for bank_id, an index is used (but still scans 43K documents).Identic
Could you just create a single index { "bank_accounts.bank_id" : 1 } and try just the bank_id? It's very strange that it's not using the index (you can delete the index after the test).Proven
I just created a simple test locally with 1000 documents that mirror the structure you have above, and did the queries you've shown above. The search scans 2 documents and returns 1 match. Are you using the most recent builds of mongodb?Proven
@WiredPrairie: We are on v2.2.2.Identic
@WiredPrairie: Adding the index { "bank_accounts.bank_id" : 1 } doesn't make find() use that index. Explain() still shows the compound index being used for find( {"bank_accounts.bank_id":ObjectId("50f7c95a31e4920009b5fc5d")}). So I removed the original compound index leaving just the one you suggested. Explain() still shows all documents being scanned (using the single element index).Identic
is there any other detail you could provide that might be helpful? I couldn't replicate the problem you're seeing. Try building a new collection one by one and see if the problem persists even in a new collection.Proven
let us continue this discussion in chatIdentic
D
8

I see three factors in play.

First, for application purposes, make sure that $elemMatch isn't a more appropriate query for this use-case. http://docs.mongodb.org/manual/reference/operator/elemMatch/. It seems like it would be bad if the wrong results came back due to multiple subdocuments satisfying the query.

Second, I imagine the high nscanned value can be accounted for by querying on each of the field values independently. .find({ bank_accounts.bank_id: X }) vs. .find({"bank_accounts.account_id": Y}). You may see that nscanned for the full query is about equal to nscanned of the largest subquery. If the index key were being evaluated fully as a range, this would not be expected, but...

Third, the { "bank_accounts.account_id" : [[{"$minElement" : 1},{"$maxElement" : 1}]] } clause of the explain plan shows that no range is being applied to this portion of the key.

Not really sure why, but I suspect it has something to do with account_id's nature (an array within a subdocument within an array). 200ms seems about right for an nscanned that high.

A more performant document organization might be to denormalize the account_id -> bank_id relationship within the subdocument, and store:

{"bank_accounts": [
{
 "bank_id": X,
 "account_id: Y,
},
{
 "bank_id": X,
 "account_id: Z,
}
]}

instead of: {"bank_accounts": [{ "bank_id": X, "account_id: [Y, Z], }]}

My tests below show that with this organization, the query optimizer gets back to work and exerts a range on both keys:

> db.accounts.insert({"something": true, "blah": [{ a: "1", b: "2"} ] })
> db.accounts.ensureIndex({"blah.a": 1, "blah.b": 1})
> db.accounts.find({"blah.a": 1, "blah.b": "A RANGE"}).explain()
{
    "cursor" : "BtreeCursor blah.a_1_blah.b_1",
    "isMultiKey" : false,
    "n" : 0,
    "nscannedObjects" : 0,
    "nscanned" : 0,
    "nscannedObjectsAllPlans" : 0,
    "nscannedAllPlans" : 0,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "blah.a" : [
            [
                1,
                1
            ]
        ],
        "blah.b" : [
            [
                "A RANGE",
                "A RANGE"
        ]
    ]
    }
}
Doretha answered 19/2, 2013 at 21:52 Comment(1)
Adding to the above, this is what Mongo's FAQs suggests as well - docs.mongodb.org/manual/faq/indexes/…Mohan

© 2022 - 2024 — McMap. All rights reserved.