Mongoose Populate after Aggregate
Asked Answered
H

2

13

I am trying to get a specific data model after I run an aggregate pipeline followed by populate but I am falling just short of it.

The desired result in the end is the following:

[
  {
    _accountId: "5beee0966d17bc42501f1234",
    name: "Company Name 1",
    contactEmail: "[email protected]",
    contactName: "contact Name 1"
    reason: "Warranties",
    total: 1152,
    lineItems: [
      {
        _id: "5beee0966d17bc42501f5086",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf43929e7179a56e21382bc",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf4392fe7179a56e21382bd",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      }
    ]
  },
  {
    _accountId: "5beee0966d17bc42501f1235",
    name: "Company Name 2",
    contactEmail: "[email protected]",
    contactName: "contact Name 2"
    reason: "Warranties",
    total: 1152,
    lineItems: [
      {
        _id: "5beee0966d17bc42501f5086",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf43929e7179a56e21382bc",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf4392fe7179a56e21382bd",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      }
    ]
  }
]

I am collecting this data from the following two models:

Warranty

{
  _id: "5beee0966d17bc42501f5086",
  jobsiteAddressStreet: String,
  jobsiteAddressCity: String,
  jobsiteAddressState" String,
  jobsiteAddressZip: Number,
  warrantyFee: Number,
  _accountId: {
    type: Schema.Types.ObjectId,
    ref: "accounts"
  },
  payStatus: String
}

Account

{
  _id: "5beee0966d17bc42501f1235",
  name: String,
  contactName: String,
  contactEmail: String
}

My current query is the following:

Warranty.aggregate([
    {
      $match: {
        payStatus: "Invoiced Next Billing Cycle"
      }
    },
    {
      $group: {
        _id: "$_accountId",
        total: {
          $sum: "$warrantyFee"
        },
        lineItems: {
          $push: {
            _id: "$_id",
            jobsiteAddress: {
              $concat: [
                "$jobsiteAddressStreet",
                " ",
                "$jobsiteAddressCity",
                ", ",
                "$jobsiteAddressState",
                " ",
                "$jobsiteAddressZip"
              ]
            },
            warrantyFee: "$warrantyFee"
          }
        }
      }
    },
    {
      $project: {
        reason: "Warranties",
        total: "$total",
        lineItems: "$lineItems"
      }
    }
  ])
    .then(warranties => {
      console.log(warranties);
      Account.populate(warranties, {
        path: "_id",
        select: "contactName contactEmail name"
      })
        .then(warranties => {
          res.send(warranties);
        })
        .catch(err => {
          res.status(422).send(err);
          throw err;
        });
    })
    .catch(err => {
      res.status(422).send(err);
      throw err;
    });

Which results in the following:

[
  {
    _id: {
      _id: "5bc39dfa331c0e2cb897b61e",
      name: "Company Name 1",
      contactEmail: "[email protected]",
      contactName: "Contact Name 1"
    },
    reason: "Warranties",
    total: 1152,
    lineItems: [
      {
        _id: "5beee0966d17bc42501f5086",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf43929e7179a56e21382bc",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf4392fe7179a56e21382bd",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      }
    ]
  },
  {
    _id: {
      _id: "5bc39dfa331c0e2cb897b61e",
      name: "Company Name 2",
      contactEmail: "[email protected]",
      contactName: "Contact Name 2"
    },
    reason: "Warranties",
    total: 1152,
    lineItems: [
      {
        _id: "5beee0966d17bc42501f5086",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf43929e7179a56e21382bc",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf4392fe7179a56e21382bd",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      }
    ]
  }
]

As you can see this is really close with some minor issues.

  1. It shows _id instead of _accountId. I defaulted to this because whenever I try to return the _accountId in $group it labels it as a non-accumulator field and when I do it in $project it just doesn't show up. The data sets must be grouped by the _accountId in the Warranty model.
  2. I would prefer to add the additional (contactName, contactEmail, name) fields to the top level object instead of creating a subdocument if possible. This may be simple or impossible as I am not super familiar with populate but can't find anything to answer my question directly.

The goal at the end of this is to take the returned object and use the array of objects to do a bulk create of documents to another collection.

-- Answer to my specific use case --

Warranty.aggregate([
    {
      $match: {
        payStatus: "Invoiced Next Billing Cycle"
      }
    },
    {
      $group: {
        _id: "$_accountId",
        total: {
          $sum: "$warrantyFee"
        },
        lineItems: {
          $push: {
            _id: "$_id",
            jobsiteAddress: {
              $concat: [
                "$jobsiteAddressStreet",
                " ",
                "$jobsiteAddressCity",
                ", ",
                "$jobsiteAddressState",
                " ",
                "$jobsiteAddressZip"
              ]
            },
            warrantyFee: "$warrantyFee"
          }
        }
      }
    },
    {
      $lookup: {
        from: Account.collection.name,
        localField: "_id",
        foreignField: "_id",
        as: "accounts"
      }
    },
    {
      $unwind: "$accounts"
    },
    {
      $project: {
        lineItems: "$lineItems",
        reason: "Warranties",
        total: "$total",
        type: "Invoice",
        date: new Date(),
        company: "$accounts.name",
        contactName: "$accounts.contactName",
        contactEmail: "$accounts.contactEmail"
      }
    },
    {
      $addFields: {
        _accountId: "$_id"
      }
    },
    {
      $project: {
        _id: 0
      }
    }
  ])

This gives me the result:

[
  {
    lineItems: [
      {
        _id: "5be203eb3afd8098d4988152",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      }
    ],
    reason: "Warranties",
    total: 384,
    type: "Invoice",
    date: "2018-11-21T14:08:15.052Z",
    company: "Company Name 1",
    contactName: "Contact Name 1",
    contactEmail: "[email protected]",
    _accountId: "5be203eb3afd8098d4988152",
    referenceNumber: 1542809296615
  },
  {
    lineItems: [
      {
        _id: "5beee0966d17bc42501f5086",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf43929e7179a56e21382bc",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      },
      {
        _id: "5bf4392fe7179a56e21382bd",
        jobsiteAddress: "1234 Street Southwest Sunnyville, Wyoming 12345",
        warrantyFee: 384
      }
    ],
    reason: "Warranties",
    total: 1152,
    type: "Invoice",
    date: "2018-11-21T14:08:15.052Z",
    company: "Company Name 2",
    contactName: "Contact Name 2",
    contactEmail: "[email protected]",
    _accountId: "5bc39dfa331c0e2cb897b61e",
    referenceNumber: 1542809295680
  }
]
Hieratic answered 20/11, 2018 at 18:58 Comment(0)
C
42

So you are actually missing some concepts here when you ask to "populate" on an aggregation result. Typically this is not what you actually do, but to explain the points:

  1. The output of aggregate() is unlike a Model.find() or similar action since the purpose here is to "reshape the results". This basically means that the model you are using as the source of the aggregation is no longer considered that model on output. This is even true if you still maintained the exact same document structure on output, but in your case the output is clearly different to the source document anyway.

    At any rate it's no longer an instance of the Warranty model you are sourcing from, but just a plain object. We can work around that as we touch on later.

  2. Probably the main point here is that populate() is somewhat "old hat" anyway. This is really just a convenience function added to Mongoose back in the very early days of implementation. All it really does is execute "another query" on the related data in a separate collection, and then merges the results in memory to the original collection output.

    For a lot of reasons, that's not really efficient or even desirable in most cases. And contrary to the popular misconception, this is NOT actually a "join".

    For a real "join" you actually use the $lookup aggregation pipeline stage, which MongoDB uses to return the matching items from another collection. Unlike populate() this is actually done in a single request to the server with a single response. This avoids network overheads, is generally faster and as a "real join" allows you to do things that populate() cannot do.

Use $lookup instead

The very quick version of what is missing here is that instead of attempting to populate() in the .then() after the result is returned, what you do instead is add the $lookup to the pipeline:

  { "$lookup": {
    "from": Account.collection.name,
    "localField": "_id",
    "foreignField": "_id",
    "as": "accounts"
  }},
  { "$unwind": "$accounts" },
  { "$project": {
    "_id": "$accounts",
    "total": 1,
    "lineItems": 1
  }}

Note that there is a constraint here in that the output of $lookup is always an array. It does not matter if there is only one related item or many to be fetched as output. The pipeline stage will look for value of the "localField" from the current document presented and use that to match values in the "foreignField" specified. In this case it's the _id from the aggregation $group target to the _id of the foreign collection.

Since the output is always an array as mentioned, the most efficient way to work with this for this instance would be to simply add an $unwind stage directly following the $lookup. All this is going to do it return a new document for each item returned in the target array, and in this case you expect it to be one. In the case where the _id is not matched in the foreign collection, the results with no matches would be removed.

As a small note, this is actually an optimized pattern as described in $lookup + $unwind Coalescence within the core documentation. A special thing happens here where the $unwind instruction is actually merged into the $lookup operation in an efficient way. You can read more about that there.

Using populate

From the above content you should be able to basically understand why populate() here is the wrong thing to do. Aside from the basic fact that the output is no longer comprised of Warranty model objects, that model really only knows about foreign items described on the _accountId property which does not exist in the output anyway.

Now you can actually define a model which can be used in order to explicitly cast the output objects into a defined output type. A short demonstration of one would involve adding code to your application for this like:

// Special models

const outputSchema = new Schema({
  _id: { type: Schema.Types.ObjectId, ref: "Account" },
  total: Number,
  lineItems: [{ address: String }]
});

const Output = mongoose.model('Output', outputSchema, 'dontuseme');

This new Output model can then be used in order to "cast" the resulting plain JavaScript objects into Mongoose Documents so that methods like Model.populate() can actually be called:

// excerpt
result2 = result2.map(r => new Output(r));   // Cast to Output Mongoose Documents

// Call populate on the list of documents
result2 = await Output.populate(result2, { path: '_id' })
log(result2);

Since Output has a schema defined that is aware of the "reference" on the _id field of it's documents the Model.populate() is aware of what it needs to do and returns the items.

Beware though since this actually generates another query. i.e:

Mongoose: warranties.aggregate([ { '$match': { payStatus: 'Invoiced Next Billing Cycle' } }, { '$group': { _id: '$_accountId', total: { '$sum': '$warrantyFee' }, lineItems: { '$push': { _id: '$_id', address: { '$trim': { input: { '$reduce': { input: { '$objectToArray': '$address' }, initialValue: '', in: { '$concat': [ '$$value', ' ', [Object] ] } } }, chars: ' ' } } } } } } ], {})
Mongoose: accounts.find({ _id: { '$in': [ ObjectId("5bf4b591a06509544b8cf75c"), ObjectId("5bf4b591a06509544b8cf75b") ] } }, { projection: {} })

Where the first line is the aggregate output, and then you are contacting the server again in order to return the related Account model entries.

Summary

So those are your options, but it should be pretty clear that the modern approach to this is instead to use $lookup and get a real "join" which is not what populate() is actually doing.

Included is a listing as a full demonstration of how each of these approaches actually work in practice. Some artistic licence is taken here, so the models represented may not be exactly the same as what you have, but there is enough there to demonstrate the basic concepts in a reproducible way:

const { Schema } = mongoose = require('mongoose');

const uri = 'mongodb://localhost:27017/joindemo';
const opts = { useNewUrlParser: true };

// Sensible defaults
mongoose.Promise = global.Promise;
mongoose.set('debug', true);
mongoose.set('useFindAndModify', false);
mongoose.set('useCreateIndex', true);

// Schema defs

const warrantySchema = new Schema({
  address: {
    street: String,
    city: String,
    state: String,
    zip: Number
  },
  warrantyFee: Number,
  _accountId: { type: Schema.Types.ObjectId, ref: "Account" },
  payStatus: String
});

const accountSchema = new Schema({
  name: String,
  contactName: String,
  contactEmail: String
});

// Special models


const outputSchema = new Schema({
  _id: { type: Schema.Types.ObjectId, ref: "Account" },
  total: Number,
  lineItems: [{ address: String }]
});

const Output = mongoose.model('Output', outputSchema, 'dontuseme');

const Warranty = mongoose.model('Warranty', warrantySchema);
const Account = mongoose.model('Account', accountSchema);


// log helper
const log = data => console.log(JSON.stringify(data, undefined, 2));

// main
(async function() {

  try {

    const conn = await mongoose.connect(uri, opts);

    // clean models
    await Promise.all(
      Object.entries(conn.models).map(([k,m]) => m.deleteMany())
    )

    // set up data
    let [first, second, third] = await Account.insertMany(
      [
        ['First Account', 'First Person', '[email protected]'],
        ['Second Account', 'Second Person', '[email protected]'],
        ['Third Account', 'Third Person', '[email protected]']
      ].map(([name, contactName, contactEmail]) =>
        ({ name, contactName, contactEmail })
      )
    );

    await Warranty.insertMany(
      [
        {
          address: {
            street: '1 Some street',
            city: 'Somewhere',
            state: 'TX',
            zip: 1234
          },
          warrantyFee: 100,
          _accountId: first,
          payStatus: 'Invoiced Next Billing Cycle'
        },
        {
          address: {
            street: '2 Other street',
            city: 'Elsewhere',
            state: 'CA',
            zip: 5678
          },
          warrantyFee: 100,
          _accountId: first,
          payStatus: 'Invoiced Next Billing Cycle'
        },
        {
          address: {
            street: '3 Other street',
            city: 'Elsewhere',
            state: 'NY',
            zip: 1928
          },
          warrantyFee: 100,
          _accountId: first,
          payStatus: 'Invoiced Already'
        },
        {
          address: {
            street: '21 Jump street',
            city: 'Anywhere',
            state: 'NY',
            zip: 5432
          },
          warrantyFee: 100,
          _accountId: second,
          payStatus: 'Invoiced Next Billing Cycle'
        }
      ]
    );

    // Aggregate $lookup
    let result1 = await Warranty.aggregate([
      { "$match": {
        "payStatus": "Invoiced Next Billing Cycle"
      }},
      { "$group": {
        "_id": "$_accountId",
        "total": { "$sum": "$warrantyFee" },
        "lineItems": {
          "$push": {
            "_id": "$_id",
            "address": {
              "$trim": {
                "input": {
                  "$reduce": {
                    "input": { "$objectToArray": "$address" },
                    "initialValue": "",
                    "in": {
                      "$concat": [ "$$value", " ", { "$toString": "$$this.v" } ] }
                  }
                },
                "chars": " "
              }
            }
          }
        }
      }},
      { "$lookup": {
        "from": Account.collection.name,
        "localField": "_id",
        "foreignField": "_id",
        "as": "accounts"
      }},
      { "$unwind": "$accounts" },
      { "$project": {
        "_id": "$accounts",
        "total": 1,
        "lineItems": 1
      }}
    ])

    log(result1);

    // Convert and populate
    let result2 = await Warranty.aggregate([
      { "$match": {
        "payStatus": "Invoiced Next Billing Cycle"
      }},
      { "$group": {
        "_id": "$_accountId",
        "total": { "$sum": "$warrantyFee" },
        "lineItems": {
          "$push": {
            "_id": "$_id",
            "address": {
              "$trim": {
                "input": {
                  "$reduce": {
                    "input": { "$objectToArray": "$address" },
                    "initialValue": "",
                    "in": {
                      "$concat": [ "$$value", " ", { "$toString": "$$this.v" } ] }
                  }
                },
                "chars": " "
              }
            }
          }
        }
      }}
    ]);

    result2 = result2.map(r => new Output(r));

    result2 = await Output.populate(result2, { path: '_id' })
    log(result2);

  } catch(e) {
    console.error(e)
  } finally {
    process.exit()
  }

})()

And the full output:

Mongoose: dontuseme.deleteMany({}, {})
Mongoose: warranties.deleteMany({}, {})
Mongoose: accounts.deleteMany({}, {})
Mongoose: accounts.insertMany([ { _id: 5bf4b591a06509544b8cf75b, name: 'First Account', contactName: 'First Person', contactEmail: '[email protected]', __v: 0 }, { _id: 5bf4b591a06509544b8cf75c, name: 'Second Account', contactName: 'Second Person', contactEmail: '[email protected]', __v: 0 }, { _id: 5bf4b591a06509544b8cf75d, name: 'Third Account', contactName: 'Third Person', contactEmail: '[email protected]', __v: 0 } ], {})
Mongoose: warranties.insertMany([ { _id: 5bf4b591a06509544b8cf75e, address: { street: '1 Some street', city: 'Somewhere', state: 'TX', zip: 1234 }, warrantyFee: 100, _accountId: 5bf4b591a06509544b8cf75b, payStatus: 'Invoiced Next Billing Cycle', __v: 0 }, { _id: 5bf4b591a06509544b8cf75f, address: { street: '2 Other street', city: 'Elsewhere', state: 'CA', zip: 5678 }, warrantyFee: 100, _accountId: 5bf4b591a06509544b8cf75b, payStatus: 'Invoiced Next Billing Cycle', __v: 0 }, { _id: 5bf4b591a06509544b8cf760, address: { street: '3 Other street', city: 'Elsewhere', state: 'NY', zip: 1928 }, warrantyFee: 100, _accountId: 5bf4b591a06509544b8cf75b, payStatus: 'Invoiced Already', __v: 0 }, { _id: 5bf4b591a06509544b8cf761, address: { street: '21 Jump street', city: 'Anywhere', state: 'NY', zip: 5432 }, warrantyFee: 100, _accountId: 5bf4b591a06509544b8cf75c, payStatus: 'Invoiced Next Billing Cycle', __v: 0 } ], {})
Mongoose: warranties.aggregate([ { '$match': { payStatus: 'Invoiced Next Billing Cycle' } }, { '$group': { _id: '$_accountId', total: { '$sum': '$warrantyFee' }, lineItems: { '$push': { _id: '$_id', address: { '$trim': { input: { '$reduce': { input: { '$objectToArray': '$address' }, initialValue: '', in: { '$concat': [ '$$value', ' ', [Object] ] } } }, chars: ' ' } } } } } }, { '$lookup': { from: 'accounts', localField: '_id', foreignField: '_id', as: 'accounts' } }, { '$unwind': '$accounts' }, { '$project': { _id: '$accounts', total: 1, lineItems: 1 } } ], {})
[
  {
    "total": 100,
    "lineItems": [
      {
        "_id": "5bf4b591a06509544b8cf761",
        "address": "21 Jump street Anywhere NY 5432"
      }
    ],
    "_id": {
      "_id": "5bf4b591a06509544b8cf75c",
      "name": "Second Account",
      "contactName": "Second Person",
      "contactEmail": "[email protected]",
      "__v": 0
    }
  },
  {
    "total": 200,
    "lineItems": [
      {
        "_id": "5bf4b591a06509544b8cf75e",
        "address": "1 Some street Somewhere TX 1234"
      },
      {
        "_id": "5bf4b591a06509544b8cf75f",
        "address": "2 Other street Elsewhere CA 5678"
      }
    ],
    "_id": {
      "_id": "5bf4b591a06509544b8cf75b",
      "name": "First Account",
      "contactName": "First Person",
      "contactEmail": "[email protected]",
      "__v": 0
    }
  }
]
Mongoose: warranties.aggregate([ { '$match': { payStatus: 'Invoiced Next Billing Cycle' } }, { '$group': { _id: '$_accountId', total: { '$sum': '$warrantyFee' }, lineItems: { '$push': { _id: '$_id', address: { '$trim': { input: { '$reduce': { input: { '$objectToArray': '$address' }, initialValue: '', in: { '$concat': [ '$$value', ' ', [Object] ] } } }, chars: ' ' } } } } } } ], {})
Mongoose: accounts.find({ _id: { '$in': [ ObjectId("5bf4b591a06509544b8cf75c"), ObjectId("5bf4b591a06509544b8cf75b") ] } }, { projection: {} })
[
  {
    "_id": {
      "_id": "5bf4b591a06509544b8cf75c",
      "name": "Second Account",
      "contactName": "Second Person",
      "contactEmail": "[email protected]",
      "__v": 0
    },
    "total": 100,
    "lineItems": [
      {
        "_id": "5bf4b591a06509544b8cf761",
        "address": "21 Jump street Anywhere NY 5432"
      }
    ]
  },
  {
    "_id": {
      "_id": "5bf4b591a06509544b8cf75b",
      "name": "First Account",
      "contactName": "First Person",
      "contactEmail": "[email protected]",
      "__v": 0
    },
    "total": 200,
    "lineItems": [
      {
        "_id": "5bf4b591a06509544b8cf75e",
        "address": "1 Some street Somewhere TX 1234"
      },
      {
        "_id": "5bf4b591a06509544b8cf75f",
        "address": "2 Other street Elsewhere CA 5678"
      }
    ]
  }
]
Comport answered 21/11, 2018 at 1:44 Comment(6)
Wow this is an incredibly detailed response. Thank you and I will take some time to digest this tomorrow and get back to you. Really informative. Love learning this!Hieratic
This is exactly what I was looking for. I made some tweaks to fit my exact needs but this got me across the finish line! Thanks! I have updated my question with the end result for anyone needing my exact use case.Hieratic
some answers need to have a medal award mechanism with those, this is exactly an answer like that!Yoong
Great answer! Any suggestions what to use with shard collections? Because you can't use lookup in that caseSeedman
Not all heroes wear capes, but some definitely write code :)Perkin
No Sharding with lookup which removes the possibility of horizontal scaling down the line.Kanzu
O
0
  1. It shows _id insead of _accountId because when you use $group, the results are grouped by the specified _accountId, so it becames the new _id of the document.
  2. There are two possible solutions for moving contactName, contactEmail and name to the top level:
    • One is handling it with javascript after it is populated. To achieve that, you can use function 'map()'.
    • The other solution is to use $lookup in the aggregation pipeline to populate the doccument in the same mongoDB query, and after $lookup, you must use $project again to build the output document as desired.
Often answered 20/11, 2018 at 23:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.