How do I unset all fields except a known set of fields?
Asked Answered
C

6

16

Suppose I have a single document in my mongo collection that looks like this:

{
    "_id": 123,
    "field_to_prune": 
    {
        "keep_field_1": "some value",
        "random_field_1": "some value",
        "keep_field_2": "some value",
        "random_field_2": "some value",
        "random_field_3": "some value"
    }
}

I want to prune that document to look like this:

{
    "_id": 123,
    "field_to_prune": 
    {
        "keep_field_1": "some value",
        "keep_field_2": "some value"
    }
}

However, my issue is that I don't know what the "random" field names are. In mongo, how would i $unset all fields except a couple of known fields?

I can think of a couple of ways, but i don't know the syntax.. i could select all field NAMES and then for each one of those unset the field. kind of like this:

[Some query to find all field names under "field_to_prune" for id 123].forEach(function(i) { 
    var key = "field_to_prune." + i;
    print("removing field: " + key);
    var mod = {"$unset": {}};
    mod["$unset"][key] = "";

    db.myCollection.update({ _id: "123" }, mod);
});

Another way I was thinking of doing it was to unset where the field name is not in an array of strings that i defined. not sure how to do that either. Any ideas?

Chandos answered 18/10, 2013 at 21:8 Comment(0)
S
2

If you don't care about atomicity then you may do it with save:

doc = db.myCollection.findOne({"_id": 123});
for (k in doc.field_to_prune) {
  if (k === 'keep_field_1') continue;
  if (k === 'keep_field_2') continue;
  delete doc.field_to_prune[k];
}
db.myCollection.save(doc);

The main problem of this solution is that it's not atomic. So, any update to doc between findOne and save will be lost.

Alternative is to actually unset all unwanted fields instead of saving the doc:

doc = db.myCollection.findOne({"_id": 123});
unset = {};
for (k in doc.field_to_prune) {
  if (k === 'keep_field_1') continue;
  if (k === 'keep_field_2') continue;
  unset['field_to_prune.'+k] = 1;
}
db.myCollection.update({_id: doc._id}, {$unset: unset});

This solution is much better because mongo runs update atomically, so no update will be lost. And you don't need another collection to do what you want.

Shingly answered 18/10, 2013 at 22:16 Comment(1)
Although this works and is fine for a single document, it is inefficient if you need to update more that one document.Saleratus
B
3

Unfortunately all the solutions presented so far are relying on script execution and some sort of forEach invocation, which will end up handling only one document at a time. If the collection to normalize is big this is going to be impractical and take way too long.

Also the functions passed to forEach are executed on the client, meaning that if the connection to the database is lost, the operation is going to be interrupted in the middle of the process, potentially leaving the collection in inconsistent state.

Performance issues could be mitigated by using bulk operations like the one proposed by @styvane here. That's solid advice.

But we can do better. Update operations support aggregation pipeline syntax since MongoDB 4.2, allowing the data normalization operation to be achieved by simply creating a new temporary object containing only the desired fields, unset the old one and then putting the temporary one back in its place, all using with the current values of the document as references:

db.theCollection.updateMany(
  {field_to_prune: {$exists: true}},
  [
    {$set: {_temp: {
      keep_field_1: '$field_to_prune.keep_field_1',
      keep_field_2: '$field_to_prune.keep_field_2'
    }}},
    {$unset: 'field_to_prune'},
    {$set: {field_to_prune: '$_temp'}},
    {$unset: '_temp'}
  ]
)

Example:

> db.myColl.insertOne({
...   _id: 123,
...   field_to_prune: {
...     keep_field_1: "some value",
...     random_field_1: "some value",
...     keep_field_2: "some value",
...     random_field_2: "some value",
...     random_field_3: "some value"
...   }
... })
{ "acknowledged" : true, "insertedId" : 123 }
>
> db.myColl.insertOne({
...   _id: 234,
...   field_to_prune: {
...     // keep_field_1 is absent
...     random_field_1: "some value",
...     keep_field_2: "some value",
...     random_field_2: "some value",
...     random_field_3: "some value"
...   }
... })
{ "acknowledged" : true, "insertedId" : 234 }
>
> db.myColl.find()
{ "_id" : 123, "field_to_prune" : { "keep_field_1" : "some value", "random_field_1" : "some value", "keep_field_2" : "some value", "random_field_2" : "some value", "random_field_3" : "some value" } }
{ "_id" : 234, "field_to_prune" : { "random_field_1" : "some value", "keep_field_2" : "some value", "random_field_2" : "some value", "random_field_3" : "some value" } }
>
> db.myColl.updateMany(
...  {field_to_prune: {$exists: true}},
...  [
...    {$set: {_temp: {
...      keep_field_1: '$field_to_prune.keep_field_1',
...      keep_field_2: '$field_to_prune.keep_field_2'
...    }}},
...    {$unset: 'field_to_prune'},
...    {$set: {field_to_prune: '$_temp'}},
...    {$unset: '_temp'}
...  ]
...)
{ "acknowledged" : true, "matchedCount" : 2, "modifiedCount" : 2 }
>
> db.myColl.find()
{ "_id" : 123, "field_to_prune" : { "keep_field_1" : "some value", "keep_field_2" : "some value" } }
{ "_id" : 234, "field_to_prune" : { "keep_field_2" : "some value" } }

Berfield answered 1/3, 2022 at 11:38 Comment(2)
Could this also work if all fields are on top level in the document ?Unstressed
the {$unset: 'field_to_prune'}, is unnecessary because the $set replaces the whole field_to_prune with the new _temp value.Helico
S
2

If you don't care about atomicity then you may do it with save:

doc = db.myCollection.findOne({"_id": 123});
for (k in doc.field_to_prune) {
  if (k === 'keep_field_1') continue;
  if (k === 'keep_field_2') continue;
  delete doc.field_to_prune[k];
}
db.myCollection.save(doc);

The main problem of this solution is that it's not atomic. So, any update to doc between findOne and save will be lost.

Alternative is to actually unset all unwanted fields instead of saving the doc:

doc = db.myCollection.findOne({"_id": 123});
unset = {};
for (k in doc.field_to_prune) {
  if (k === 'keep_field_1') continue;
  if (k === 'keep_field_2') continue;
  unset['field_to_prune.'+k] = 1;
}
db.myCollection.update({_id: doc._id}, {$unset: unset});

This solution is much better because mongo runs update atomically, so no update will be lost. And you don't need another collection to do what you want.

Shingly answered 18/10, 2013 at 22:16 Comment(1)
Although this works and is fine for a single document, it is inefficient if you need to update more that one document.Saleratus
S
2

Actually the best way to do this is to iterate over the cursor an use the $unset update operate to remove those fields in subdocuments except the known fields you want to keep. Also you need to use "bulk" operations for maximum efficiency.


MongoDB 3.2 deprecates Bulk() and its associated methods. So if you should use the .bulkWrite()

var count = 0;
var wantedField = ["keep_field_1", "keep_field_2"]; 


var requests = [];
var count = 0;
db.myCollection.find().forEach(function(document) { 
    var fieldToPrune = document.field_to_prune; 
    var unsetOp = {};
    for (var key in fieldToPrune) {     
        if ((wantedFields.indexOf(key) === -1) && Object.prototype.hasOwnProperty.call(fieldToPrune, key ) ) {
            unsetOp["field_to_prune."+key] = " ";        
        }
    }
    requests.push({ 
        "updateOne": { 
            "filter": { "_id": document._id }, 
            "update": { "$unset": unsetOp } 
         }
    });         
    count++;    
    if (count % 1000 === 0) {   
        // Execute per 1000 operations and re-init  
        db.myCollection.bulkWrite(requests); 
        requests = []; 
    } 
})

// Clean up queues
db.myCollection.bulkWrite(requests)

From MongoDB 2.6 you can use the Bulk API.

var bulk =  db.myCollection.initializeUnorderedBulkOp();
var count = 0;


db.myCollection.find().forEach(function(document) { 
    fieldToPrune = document.field_to_prune; 
    var unsetOp = {}; 
    for (var key in fieldToPrune) {     
        if ((wantedFields.indexOf(key) === -1) && Object.prototype.hasOwnProperty.call(fieldToPrune, key ) ) {  
            unsetOp["field_to_prune."+key] = " ";             
        } 
    } 
    bulk.find({ "_id": document._id }).updateOne( { "$unset": unsetOp } );         
    count++; 
    if (count % 1000 === 0) {
        // Execute per 1000 operations and re-init     
        bulk.execute();     
        bulk =  db.myCollection.initializeUnorderedBulkOp(); 
    } 
})

// Clean up queues
if (count > 0) { 
    bulk.execute(); 
}
Saleratus answered 15/3, 2016 at 18:12 Comment(0)
C
0

I solved this with a temporary collection. i did the following:

db.myCollection.find({"_id": "123"}).forEach(function(i) {
    db.temp.insert(i);
});

db.myCollection.update(
    {_id: "123"}, 
    { $unset: { "field_to_prune": ""}}
)

db.temp.find().forEach(function(i) {
    var key1 = "field_to_prune.keep_field_1";
    var key2 = "field_to_prune.keep_field_2";
    var mod = {"$set": {}};
    mod["$set"][key1] = i.field_to_prune.keep_field_1;
    mod["$set"][key2] = i.field_to_prune.keep_field_2;

    db.myCollection.update({_id: "123"}, mod)
});

db.getCollection("temp").drop();
Chandos answered 18/10, 2013 at 21:30 Comment(0)
U
0

For people still looking for a easier solution, I did find one using $replaceWith

this work both with .update and .updateMany

db.products.updateMany(
  {},
  [
    {
      $replaceWith: {
        name: "$name",
        // nested should be working here, but I didn't tested it
        "field_to_prune.keep_field_1": "$field_to_prune.keep_field_1",
        "field_to_prune.keep_field_2": "$field_to_prune.keep_field_2"
      }
    }
  ]
);

_id will be kept by default so no need to specify it

The {} in the $replaceWith argument is the whole final object, so only field specified here will be kept, also as you are replacing the document you have access to the old data by using $ before the value to use

For people using it with .updateMany, keep in mind that it will fail if you are replacing with a non existing value

If you are looking for more information doc

Unstressed answered 7/2 at 16:25 Comment(0)
F
-1

here is my solution, I think easier than the others I read:

db.labels.find({"_id" : ObjectId("123")}).snapshot().forEach(
function (elem) {
db.labels.update({_id: elem._id},
{'field_to_prune.keep_field_1': elem.field_to_prune.keep_field_1, 
 'field_to_prune.keep_field_2': elem.field_to_prune.keep_field_2});
});

I'm deleting everything but the fields 'keep_field_1' and 'keep_field_2'

Foggy answered 6/4, 2016 at 14:50 Comment(4)
This is bad. You should do this.Saleratus
this what? And why is bad?Foggy
Sorry that I should have explained why. As you can see from my answer, there is much more better way to this . The first thing is that if you have n documents in your collection, your solution implies that you will hit your database n times which is bad for performance. That being said I also failed to see how your solution solves OP's problem.Saleratus
the iteration is not the point of the question, in fact the question was about one single document, and my solution is about one single document as well. Beside my solution totally solves the problemFoggy

© 2022 - 2024 — McMap. All rights reserved.