Can $in / $or queries use indexes?
Asked Answered
A

1

8

I'm playing with Mango queries on a CouchDB 2.0 instance, through the fantastic pouchdb-find.

A few times I got the dreaded no matching index found, create an index to optimize query time warning even though I was using indexed fields.

Just now I got it when selecting "type": {"$in": ["a", "b"]} or the equivalent "$or": [{"type": "a"}, {"type": "b"}] , even though an index on type exists.

Googling (cloudant query docs, pouchdb-find docs, SO question) didn't help, and in the latter @nlawson says that some predicates ($ne in the aforementioned question, but maybe my $in / $or fall into the same basket?) "currently do not use any index".

  • If I'm indeed in the same boat, what does that mean? Is the impossibility to use indexes on queries using certain predicates a limitation of the mango backend, or a pouchdb one?
  • Am I doing something wrong / is there an index workaround to avoid this?
    • More generally, is there documentation I could read to get a deeper grip on how indexes work and how to troubleshoot them?

Thanks!

Antecedency answered 18/11, 2016 at 21:42 Comment(4)
Can you share the full call to find? Are you specifying a sort? This may be a pouchdb-find limitation. I'm not seeing this issue when testing directly on Cloudant. Could you change your query to something like this: selector: { _id: {"$gt":0}, type: {"$in": ["a", "b"]} }Poetry
@Poetry that was the full query (no sorting, just a bare selector on type). And adding _id: {"$gt":0} did the trick oÔ. 1. Can you explain what's going on? 2. Feel free to convert your comment to an answer, I'll accept it. Thanks!Antecedency
I am also confused as to why you need to include _id in a query when i created an index on selector.Southern
@EricShell I asked about this _id trick in this couchdb-user mailing-list thread. Turns out: it's not a way to use an index, quite the contrary: it's only a way to explicitly tell pouchdb-find I know I'll be running without index, because my query won't fit into CouchDB's final map/reduce + composite key-based indexing, thus requiring fetching allDocs and doing in-memory filtering. I'll let this settle for a while, and will answer my question with a "no".Antecedency
A
10

Answering my question: no, $in/or queries cannot use indices. I asked the question in this user@couchdb mailing list thread, where Garren Smith answered and pointed to Understanding Mango View-Based Indexes vs. Search-Based Indexes and A look under the covers of PouchDB-find. To quote Garren,

The reason that adding "_id": {"$gt": 0} works is because pouchdb-find/mango fetches all the docs using the _all_docs index and then processes the $in operator in memory.

If you have a large database this will hurt. But you can use a better value than 0 to reduce the number of documents that need to be sorted in memory, which is a good thing.

So, careful, "_id": {"$gt": 0}is by no means a way to use an index (that's the impression I got when @markwatsonatx suggested it), it's only a way to suppress pouchdb-find's warning, by telling it "I know this won't fit a map/reduce, I'll be running in-memory operations on allDocs, and I'm aware of the perf. consequences". Also,

The warning is just to help anyone new to using Mango that what they are doing isn't the best way on a large database but will be fine on a small database. It's a fine way to experiment but once you start noticing performance issues, creating an index is the way forward.

I'll complement with a little benchmark I made, comparing different approaches to fetch {10, 100, 1000, 10000} "cases" (using selectors on an indexed field) from a db containing [10000 cases, 100000 noise documents]

|number of cases fetched|10 |100 |1000 |10000 | |-----------------------|------|-------|--------|------| |$in |2452ms|2539ms |2474ms |5032ms| |$in + $gt |905ms |784ms |1014ms |3805ms| |$in + $gt + $lt |5ms |13ms |100ms |3854ms| |$or |2638ms|11763ms|101279ms|- |

Antecedency answered 27/1, 2017 at 15:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.