Unfortunately all the solutions presented so far are relying on script execution and some sort of forEach
invocation, which will end up handling only one document at a time. If the collection to normalize is big this is going to be impractical and take way too long.
Also the functions passed to forEach
are executed on the client, meaning that if the connection to the database is lost, the operation is going to be interrupted in the middle of the process, potentially leaving the collection in inconsistent state.
Performance issues could be mitigated by using bulk operations like the one proposed by @styvane here. That's solid advice.
But we can do better. Update operations support aggregation pipeline syntax since MongoDB 4.2, allowing the data normalization operation to be achieved by simply creating a new temporary object containing only the desired fields, unset the old one and then putting the temporary one back in its place, all using with the current values of the document as references:
db.theCollection.updateMany(
{field_to_prune: {$exists: true}},
[
{$set: {_temp: {
keep_field_1: '$field_to_prune.keep_field_1',
keep_field_2: '$field_to_prune.keep_field_2'
}}},
{$unset: 'field_to_prune'},
{$set: {field_to_prune: '$_temp'}},
{$unset: '_temp'}
]
)
Example:
> db.myColl.insertOne({
... _id: 123,
... field_to_prune: {
... keep_field_1: "some value",
... random_field_1: "some value",
... keep_field_2: "some value",
... random_field_2: "some value",
... random_field_3: "some value"
... }
... })
{ "acknowledged" : true, "insertedId" : 123 }
>
> db.myColl.insertOne({
... _id: 234,
... field_to_prune: {
... // keep_field_1 is absent
... random_field_1: "some value",
... keep_field_2: "some value",
... random_field_2: "some value",
... random_field_3: "some value"
... }
... })
{ "acknowledged" : true, "insertedId" : 234 }
>
> db.myColl.find()
{ "_id" : 123, "field_to_prune" : { "keep_field_1" : "some value", "random_field_1" : "some value", "keep_field_2" : "some value", "random_field_2" : "some value", "random_field_3" : "some value" } }
{ "_id" : 234, "field_to_prune" : { "random_field_1" : "some value", "keep_field_2" : "some value", "random_field_2" : "some value", "random_field_3" : "some value" } }
>
> db.myColl.updateMany(
... {field_to_prune: {$exists: true}},
... [
... {$set: {_temp: {
... keep_field_1: '$field_to_prune.keep_field_1',
... keep_field_2: '$field_to_prune.keep_field_2'
... }}},
... {$unset: 'field_to_prune'},
... {$set: {field_to_prune: '$_temp'}},
... {$unset: '_temp'}
... ]
...)
{ "acknowledged" : true, "matchedCount" : 2, "modifiedCount" : 2 }
>
> db.myColl.find()
{ "_id" : 123, "field_to_prune" : { "keep_field_1" : "some value", "keep_field_2" : "some value" } }
{ "_id" : 234, "field_to_prune" : { "keep_field_2" : "some value" } }