I'm a little bit late, but the forEach
version above can be made faster by orders of magnitude by using bulkWrite
instead of save
, and write with 0.25ms instead of 1.2ms per document.
Note: my data required cleaning up strings, but that doesn't have an effect on writing documents.
Without bulkWrite
, i.e. save
:
mongo> i=0;
0
mongo> start= new Date();
ISODate("2023-05-15T07:20:30.231Z")
mongo> db.myCol.find().forEach( d => {
if (typeof(d.timeslot)!="string"){ return; };
d.timeslot= new ISODate(d.timeslot.replace(' UTC',''));
db.myCol.save(d);
i+= 1;
if (i%1000==0) {
end=new Date();
diff= (end.valueOf()-start.valueOf());
printjson({
ms: diff,
n: 1000,
avg_ms: diff/1000
});
start=new Date();
}
});
{ "ms" : 12722, "n" : 1000, "avg_ms" : 12.722 }
{ "ms" : 1163, "n" : 1000, "avg_ms" : 1.163 }
{ "ms" : 1208, "n" : 1000, "avg_ms" : 1.208 }
{ "ms" : 1183, "n" : 1000, "avg_ms" : 1.183 }
{ "ms" : 1168, "n" : 1000, "avg_ms" : 1.168 }
:
With bulkWrite
:
mongo> blk=[];
[ ]
mongo> start= new Date();
ISODate("2023-05-15T07:27:16.882Z")
mongo> db.myCol.find().forEach( d => {
if (typeof(d.timeslot)!="string"){ return; };
blk.push({ updateOne: { filter: {_id:d._id }, update: { $set: { timeslot: new ISODate(d.timeslot.replace(' UTC','')) } } } });
if (blk.length > 0 && blk.length%1000==0) {
res=db.myCol.bulkWrite(blk);
end=new Date();
diff= (end.valueOf()-start.valueOf());
printjson({
ms: diff,
n: res.matchedCount,
avg_ms: diff/res.matchedCount
});
start=new Date();
blk=[];
}
});
mongo> db.myCol.bulkWrite(blk);
{ "ms" : 9745, "n" : 1000, "avg_ms" : 9.745 }
{ "ms" : 252, "n" : 1000, "avg_ms" : 0.252 }
{ "ms" : 231, "n" : 1000, "avg_ms" : 0.231 }
{ "ms" : 213, "n" : 1000, "avg_ms" : 0.213 }
{ "ms" : 209, "n" : 1000, "avg_ms" : 0.209 }
: