String field value length in mongoDB
Asked Answered
S

7

122

The data type of the field is String. I would like to fetch the data where character length of field name is greater than 40.

I tried these queries but returning error. 1.

db.usercollection.find(
{$where: "(this.name.length > 40)"}
).limit(2);

output :error: {
    "$err" : "TypeError: Cannot read property 'length' of undefined near '40)' ",
    "code" : 16722
}

this is working in 2.4.9 But my version is 2.6.5

Shakiashaking answered 11/4, 2015 at 12:10 Comment(1)
Did you try gt instead of >Messapian
O
239

For MongoDB 3.6 and newer:

The $expr operator allows the use of aggregation expressions within the query language, thus you can leverage the use of $strLenCP operator to check the length of the string as follows:

db.usercollection.find({ 
    name: { $exists: true },
    $expr: { $gt: [{ $strLenCP: '$name' }, 40] } 
})

For MongoDB 3.4 and newer:

You can also use the aggregation framework with the $redact pipeline operator that allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.

This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient.

As for the logical condition, there are String Aggregation Operators that you can use $strLenCP operator to check the length of the string. If the length is $gt a specified value, then this is a true match and the document is "kept". Otherwise it is "pruned" and discarded.


Consider running the following aggregate operation which demonstrates the above concept:

db.usercollection.aggregate([
    { $match: { name: { $exists: true } } },
    { $redact: {
         $cond: [
            { $gt: [ { $strLenCP: "$name" }, 40] },
            "$$KEEP",
            "$$PRUNE"
        ]
    } },
    { $limit: 2 }
])

If using $where, try your query without the enclosing brackets:

db.usercollection.find({ $where: "this.name.length > 40" }).limit(2);

A better query would be to to check for the field's existence and then check the length:

db.usercollection.find({ name: { $type: 2 }, $where: "this.name.length > 40" }).limit(2); 

or:

db.usercollection.find({ name: { $exists: true }, $where: "this.name.length > 
40" }).limit(2); 

MongoDB evaluates non-$where query operations before $where expressions and non-$where query statements may use an index. A much better performance is to store the length of the string as another field and then you can index or search on it; applying $where will be much slower compared to that. It's recommended to use JavaScript expressions and the $where operator as a last resort when you can't structure the data in any other way, or when you are dealing with a small subset of data.


A different and faster approach that avoids the use of the $where operator is the $regex operator. Consider the following pattern which searches for

db.usercollection.find({"name": {"$type": 2, "$regex": /^.{41,}$/}}).limit(2); 

Note - From the docs:

If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.

A regular expression is a “prefix expression” if it starts with a caret (^) or a left anchor (\A), followed by a string of simple symbols. For example, the regex /^abc.*/ will be optimized by matching only against the values from the index that start with abc.

Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.

Overman answered 11/4, 2015 at 12:45 Comment(8)
All the 3 queries working fine. But the first one is accepting upto 15. ie "this.name.length > 15". If we give 16 or above it is giving the same error.Shakiashaking
If we want to execute the same for inner document like profile.name then can you please suggest a syntax for it.Shakiashaking
For an embedded document field try db.usercollection.find({"profile.name": {$type: 2}, $where: "this.profile.name.length > 40"}).limit(2); Overman
working perfect but i was wondering, is there any other way also of doing this. like Mongo Way db.collection.find({'country.length':{$gt:20}}) something like it.Maid
@chridam's answer is perfect! But just wanted to emphasize the fact that the error is more to do with checking the existence of the field in the record rather than the presence / absence of brackets. Even with / without the brackets it did not work for me. As such, db.usercollection.find({name: {$exists: true}, $where: "this.name.length > 40"}).limit(2); is the right query and will always work.Unbuckle
My requirement is to get the length of the string field in the result. As we don't have support for "strLenCP" in 3.0/prior versions, how can we achieve that?Guzman
@sureshgoud Would suggest you create a new question for this as comments are more likely to be purged in the long-runOverman
#51041384Guzman
P
29

Queries with $where and $expr are slow if there are too many documents.

Using $regex is much faster than $where, $expr.

db.usercollection.find({ 
  "name": /^[\s\S]{40,}$/, // name.length >= 40
})

or 

db.usercollection.find({ 
  "name": { "$regex": "^[\s\S]{40,}$" }, // name.length >= 40
})

This query is the same meaning with

db.usercollection.find({ 
  "$where": "this.name && this.name.length >= 40",
})

or

db.usercollection.find({ 
    "name": { "$exists": true },
    "$expr": { "$gte": [ { "$strLenCP": "$name" }, 40 ] } 
})

I tested each queries for my collection.

# find
$where: 10529.359ms
$expr: 5305.801ms
$regex: 2516.124ms

# count
$where: 10872.006ms
$expr: 2630.155ms
$regex: 158.066ms
Pythian answered 17/5, 2019 at 6:47 Comment(0)
M
9

Here is one of the way in mongodb you can achieve this.

db.usercollection.find({ $where: 'this.name.length < 4' })
Mortar answered 27/10, 2018 at 18:22 Comment(3)
as this query uses javascript expression, this can't use the mongodb indexes, so it's slow.Mauceri
$where may also not be allowed on free MongoDB SaaS tiers (e.g. it isn't on the Atlas 512MB plan).Alonzoaloof
That's right. As far as I know, Even AWS Document DB also doesn't support it as of now.Mortar
P
9

Find anything with a name with 40 or more characters:

db.usercollection.find({name: /.{40}/})

(simplified the RegEx from Fumiya Karasawa's answer)

Procathedral answered 31/12, 2022 at 6:2 Comment(0)
M
7

This query will give both field value and length:

db.usercollection.aggregate([
{
    $project: {
        "name": 1,
        "length": { $strLenCP: "$name" }
    }} ])
Mattox answered 11/9, 2019 at 6:19 Comment(1)
and if one doesnt want to use aggregate then it can be db.collection.find({$expr: {$lt: [{$strLenCP: "$name"}, 20]}})Gunnery
P
3

I had a similar kind of scenario, but in my case string is not a 1st level attribute. It is inside an object. In here I couldn't find a suitable answer for it. So I thought to share my solution with you all(Hope this will help anyone with a similar kind of problem).

Parent Collection 

{
"Child":
{
"name":"Random Name",
"Age:"09"
}
}

Ex: If we need to get only collections that having child's name's length is higher than 10 characters.

 db.getCollection('Parent').find({$where: function() { 
for (var field in this.Child.name) { 
    if (this.Child.name.length > 10) 
        return true;

}
}})
Process answered 15/6, 2017 at 9:14 Comment(0)
A
0

Found a better solution:

Sample Data from MongoDB

Here word is field which contains a string of various length.

const result = await client
    .db("colors")
    .collection("words")
    .aggregate([
      {
        $addFields: {
          length: { $strLenCP: "$word" },
        },
      },
      {
        $match: {
          length: { $gt: 5 },
        },
      },
      {
        $project: {
          word: "$word",
          length: "$length",
        },
      }, 
    ]);
Amling answered 4/1 at 6:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.