MongoDB aggregate by field exists
Asked Answered
N

10

101

I have a hard time believing this question hasn't been asked and answered somewhere already, but I can't find any trace of it.

I have a MongoDB aggregation query that needs to group by a boolean: the existence of another field.

For example let's start with this collection:

> db.test.find()
{ "_id" : ObjectId("53fbede62827b89e4f86c12e"),
  "field" : ObjectId("53fbede62827b89e4f86c12d"), "name" : "Erik" }
{ "_id" : ObjectId("53fbee002827b89e4f86c12f"), "name" : "Erik" }
{ "_id" : ObjectId("53fbee092827b89e4f86c131"),
  "field" : ObjectId("53fbee092827b89e4f86c130"), "name" : "John" }
{ "_id" : ObjectId("53fbee122827b89e4f86c132"), "name" : "Ben" }

2 documents have "field", and 2 don't. Note that each value of "field" may be different; we just want to group on its existence (or non-nullness works for me too, I don't have any null values stored).

I've tried using $project, but $exists doesn't exist there, and $cond and $ifNull haven't helped me. The field always appears to exist, even when it doesn't:

> db.test.aggregate(
  {$project:{fieldExists:{$cond:[{$eq:["$field", null]}, false, true]}}},
  {$group:{_id:"$fieldExists", count:{$sum:1}}}
)
{ "_id" : true, "count" : 4 }

I would expect the following much simpler aggregate to work, but for some reason $exists isn't supported in this way:

> db.test.aggregate({$group:{_id:{$exists:"$field"}, count:{$sum:1}}})
assert: command failed: {
  "errmsg" : "exception: invalid operator '$exists'",
  "code" : 15999,
  "ok" : 0
} : aggregate failed
Error: command failed: {
  "errmsg" : "exception: invalid operator '$exists'",
  "code" : 15999,
  "ok" : 0
} : aggregate failed
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.commandWorked (src/mongo/shell/assert.js:244:5)
    at DBCollection.aggregate (src/mongo/shell/collection.js:1149:12)
    at (shell):1:9
2014-08-25T19:19:42.344-0700 Error: command failed: {
  "errmsg" : "exception: invalid operator '$exists'",
  "code" : 15999,
  "ok" : 0
} : aggregate failed at src/mongo/shell/assert.js:13

Does anyone know how to get the desired result from a collection like this?

Expected result:

{ "_id" : true, "count" : 2 }
{ "_id" : false, "count" : 2 }
National answered 26/8, 2014 at 2:31 Comment(0)
D
142

I solved the same problem just last night, this way:

> db.test.aggregate({$group:{_id:{$gt:["$field", null]}, count:{$sum:1}}})
{ "_id" : true, "count" : 2 }
{ "_id" : false, "count" : 2 }

See http://docs.mongodb.org/manual/reference/bson-types/#bson-types-comparison-order for a full explanation of how this works.

Added From comment section:

To check if the value doesn't exist or is null use { $lte: ["$field", null] }

Dantzler answered 26/8, 2014 at 20:59 Comment(7)
Nice, clever solution! I didn't realize BSON defined comparisons across different types.National
It's certainly clever, and I don’t have any better solution, but it feels like a hack.Dualpurpose
To check if the value doesn't exist or is null use { $lte: ["$field", null] }Modification
@RickyBoyce thanks for that tip. I was using $eq and couldn't figure out why I wasn't getting the expected result. Switched to $lte and now everything works fine.Gravure
I don't know why this is not mentioned but for me $match: { var_name: { $exists : true } } work totally fine. I have not tested this for grouping the aggregations.Serb
Used this solution in 2023, solutions below involving checks for null / undefined did not seem to work in agg. pipeline.Hap
@Serb your syntax covers $match not expression operators.Stere
H
59

I solved it with checking for undefined

$ne : [$var_to_check, undefined]

or

$ne:  [ { $type : "$var_to_check"}, 'missing'] }

This returns true if the var is defined

Hipparch answered 7/9, 2017 at 5:44 Comment(3)
I'd really recommend the second approach over the one that is marked as answered. other approaches seems to be hacky. docs.mongodb.com/manual/reference/operator/aggregation/typeNeuburger
The second approach worked in aggregation pipeline, thanks @HipparchPublia
undefined seems to be deprecated so the second looks like the only valid solutionLipchitz
R
29

The $exists operator is a "query" operator, so it is used basically to "filter" results rather than identify a logical condition.

As a "logical" operator the aggregation framework supports the $ifNull operator. This returns the field value where it exists or the alternate supplied value where it does not or otherwise evaluates to null

db.test.aggregate([
    { "$group": {
        "_id": { "$ifNull": [ "$field", false ] },
        "count": { "$sum": 1 }
    }}
])

But of course, even that is not a "true/false" comparison, so unless you actually want to return the actual value of the field where it is present, then you are probably better off with a $cond statement much like you have:

db.test.aggregate([
    { "$group": {
        "_id": { "$cond": [{ "$eq": [ "$field", null ] }, true, false ] },
        "count": { "$sum": 1 }
    }}
])

Where $ifNull can be very useful is in replacing not existent array fields that would otherwise cause an error using $unwind. You can then do something like return a single element or empty array so this does not cause problems in the rest of your pipeline processing.

Rapids answered 26/8, 2014 at 3:8 Comment(4)
As I pointed out in the OP, your solution gives the wrong result: { "_id" : false, "count" : 4 } Thanks for answering though.National
actually, the $eq returns false all the time, even if the field exists. but if using $gt it works. use "_id": { "$cond": [{ "$gt": [ "$field", null ] }, true, false ] } insteadPractical
$eq does not check if field exists. Thanks you @RomanBlachman for $gt tipFordone
Is there a difference when using $match? Because I just tried $match: { var_name: { $exists : true } } and the aggregation worked fine.Serb
D
26

Dunno how it was but now in 2019 there is clean solution. In aggregation pipeline do this

$match: {"my_field": {$ne: null}}

Nice thing is in my lang 'ne' means not :)

Donets answered 7/2, 2019 at 13:6 Comment(4)
I think might not work in aggregation pipeline, ended-up using $gtYester
The above query will return documents where my_field does not exist. That's not what is expected.Flita
ne means not equal.Medovich
we are the knights who say, "ne"!Griffy
O
11

A semantically transparent solution to check if a field exists and is not null:

{ $ne: [{ $ifNull: ["$field", null] }, null] }

To check if it is missing, replace $ne with $eq.

Outline answered 5/8, 2021 at 20:5 Comment(0)
M
8

In short

{'$project': {
    'field_exists': {'$or': [
        {'$eq': ['$field', null]}, 
        {'$gt': ['$field', null]},
    ]},
}}

Details

$exists means that the field exists, even if it is null or any other empty value. That is why all the answers on this page are incorrect.

Let's test a bit. Check this:

// Let's take any collection that have docs
db.getCollection('collection').aggregate([
  // Get arbitrary doc, no matter which, we won't use it
  {"$limit": 1},
  // Project our own fields (just create them with $literal)
  {'$project': {
    '_id': 0,
    'null_field': {'$literal': null},
    'not_null_field': {'$literal': {}},
  }},
])

We'll get this:

{
    "null_field" : null,
    "not_null_field" : {}
}

Then let's clarify which fields exist in this doc:

  1. null_field - exists
  2. not_null_field - exists
  3. non_existent_field - doesn't.

Okay, it's time to test project stage I've mentioned above. Let's add it for every field we're interested in:

{'$project': {
    'null_field_exists': {'$or': [
        {'$eq': ['$null_field', null]}, 
        {'$gt': ['$null_field', null]},
    ]},
    'not_null_field_exists': {'$or': [
        {'$eq': ['$not_null_field', null]}, 
        {'$gt': ['$not_null_field', null]},
    ]},
    'non_existent_field_exists': {'$or': [
        {'$eq': ['$non_existent_field', null]}, 
        {'$gt': ['$non_existent_field', null]},
    ]},
}},

What we get is:

{
    "null_field_exists" : true,
    "not_null_field_exists" : true,
    "non_existent_field_exists" : false
}

Correct!

And a small note: we use null for comparing because it is the smallest value at least valuable (smaller is just the non-existence).

Medovich answered 7/3, 2019 at 13:43 Comment(0)
C
6

My answer is:

{$match:{
    $and:[{
        name:{
            $exists:true
        }
    }, {
        $expr:{
            $eq:["$$id", "$_id"]
        }
    }]
}}

I use this in lookup, on my pipeline stage. This post 2 rules the first one, name must exist. And the second thing is the relation between these 2 collection. I Am sure u can modify this for your question.

Centrosymmetric answered 14/3, 2020 at 11:44 Comment(0)
G
6

In mongoose only following working

$ne:  [ { $type : "$var_to_check"}, 'missing'] }
Gunnery answered 31/12, 2020 at 18:30 Comment(0)
T
4

I solved it by using $addFields with $ifNull, and then $match the added field by checking if its value is null.

collection.aggregate(
    [
       {
          $addFields:{
              fieldName:{
                 $ifNull:["$fieldToCheckIfExists", null]
              }
          }
       },
       {
          $match:{
              fieldName:{
                $ne: null
          }
       }
    ]
Triturate answered 7/4, 2021 at 3:49 Comment(0)
C
0

In Group, if you want to count the existing fields, but not group as the other answers shows, you can use

{
  _id: "$groupField",
  qtyExists: {
    $sum: {
      $cond: ["$field", 1, 0]
    }
  },
  qtyNotExists: {
    $sum: {
      $cond: ["$field", 0, 1]
    }
  },
}
Correction answered 2/4, 2022 at 10:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.