MongoDB $lookup pipeline: does this use indexes?
Asked Answered
N

2

6

I have a query that makes use of the pipeline feature of $lookup and it also uses $expr. Well, it works, but the performance is not great. It looks up stuff in a collection with about 4000 docs and joins 2 other collections (using $lookup blocks). It takes about 2000ms to run, even though there are only a few thousand documents in each collection.

The query looks something like this:

            {
                $match: {
                   language: 'str'
                }
            },
            {
                $lookup: {
                    from: 'somecollection',
                    let: { someId: '$someId' },
                    pipeline: [
                        {
                            $match: {
                                $expr: {
                                    $and: [
                                        {
                                            $eq: [
                                                '$_id',
                                                '$$someId'
                                            ]
                                        },
                                        {
                                            $gte: ['$field',value]
                                        },
                                        {
                                            $lte: ['$field2',value]
                                        }
                                       ....
                                       // some more conditions..

                                    ]
                                }
                            }
                        }
                    ]

Running explain() on this provides info about the first $match block only. But how can figure out whether the $expr in the pipeline is using an index or not?

I tried to add indexes to all fields used in the pipeline, and I also tried to create a compound index, but I couldn't make it any faster.

How could i improve performance?

The structure of my query:

match (filter by language),
lookup (col1 join)
lookup (col2 join)
project,
sort

I'm using Mongo 4.0

EDIT:

Actually, can the ops in the pipeline use an index or not? I've heard that only $eq is able to use indexes, the other ops can't. Is this true???

Thanks in advance!

Neediness answered 2/3, 2020 at 21:32 Comment(8)
how many documents does the first $match stage produce? i'm guessing a couple of thousand? if so, the lookup pipeline will be executed for each of those documents. this is probably where the performance issue comes from. maybe lookups/joins may not be the best option for what you're trying to do here.Septarium
you could extract the lookup pipeline and run it by itself as the first stage with some static data and use explain to see if it does IXSCANs. if it does, then your problem is not with index usage but having to execute too many pipelines per each source document from the first stage.Septarium
@RyanGunner thanks for the info. Is there any other way to perform a join than aggregation? As far as I know, there isn't. I have a collection (let's call it A) which holds thousands of items whose "someId" field relates to either collection B or C. I need to select all items from A with their related items from B or C populated. Is there a better way to do this than using aggregation query?Neediness
may i ask, what are you doing with thousands of records from A? are you sending thousands of items to the UI of your app? if so, why not do paging? say 100 A items per batch? that way your lookups will be fast.Septarium
@RyanGunner I can't use pagination because this is an RSS / Atom feed. Feed readers just expect a valid XML document, they cannot handle pagination. What I could use though might be a HTTP stream, so I'd keep pushing new items constantly as long as AggregationCursor yields results. But this would require a kinda bigger refact, and I'm not even sure Loopback 3 could do it. So I'm looking for an easier solution for now :)Neediness
@RyanGunner Btw, problems is the DB query is what's slow, not the NodeJS layer. The $lookup is slow and the subqueries cannot even use an index (except for $eq ops). But there is not other way to perform a join in Mongodb.. :(Neediness
Let us continue this discussion in chat.Septarium
@user2297996, could you post an example document from col1, col2, col3 and the desired output ?Raven
I
2

It is supposed to, but there have been some bugs reported (and fixed) related to this:

https://jira.mongodb.org/browse/SERVER-31760

Make sure you're using the latest version and experiment.

Inessential answered 23/9, 2020 at 22:49 Comment(0)
P
-3
{$lookup:
 {
   from: <collection to join>,
   localField: <field from the input documents>,
   foreignField: <field from the documents of the "from" collection>,
   as: <output array field>
 }
}
Physiology answered 3/3, 2020 at 4:18 Comment(1)
Thx, but this is only possible when you join on a single key. You can't use pipeline with this approach. So I don't think this is the right solutionNeediness

© 2022 - 2024 — McMap. All rights reserved.