how to ignore duplicate documents when using insertMany in mongodb php library?
Asked Answered
D

2

32

I am using mongo php library, and trying to insert some old data into mongodb. I used insertMany() method and pass a huge array of document, that may have duplicate documents on unique indexes.

Lets say I have a users collection and have these indexes:

[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.users"
    },
    {
        "v" : 1,
        "unique" : true,
        "key" : {
            "email" : 1
        },
        "name" : "shop_id_1_title_1",
        "ns" : "test.users"
    }
]

If there's a duplicate document, MongoDB\Driver\Exception\BulkWriteException would raise and stop the process. I want to find a way to ignore inserting duplicate documents (and also preventing the exception from raise) and continue inserting other documents.

I found in php.net documentation a flag called continueOnError that do the trick but it seems that it's not working with this library.

The example from php.net:

<?php

$con = new Mongo;
$db = $con->demo;

$doc1 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010001'),
        'id' => 1,
        'desc' => "ONE",
);
$doc2 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010002'),
        'id' => 2,
        'desc' => "TWO",
);
$doc3 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010002'), // same _id as above
        'id' => 3,
        'desc' => "THREE",
);
$doc4 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010004'),
        'id' => 4,
        'desc' => "FOUR",
);

$c = $db->selectCollection('c');
$c->batchInsert(
    array($doc1, $doc2, $doc3, $doc4),
    array('continueOnError' => true)
);

And the way I tried to use the flag with mongo php library:

<?php

$users = (new MongoDB\Client)->test->users

$collection->insertMany([
    [
        'username' => 'admin',
        'email' => '[email protected]',
        'name' => 'Admin User',
    ],
    [
        'username' => 'test',
        'email' => '[email protected]',
        'name' => 'Test User',
    ],
    [
        'username' => 'test 2',
        'email' => '[email protected]',
        'name' => 'Test User 2',
    ],
],
[
    'continueOnError' => true    // This option is not working
]);

The code above still raise the exception, and seems not to work. Is there other option flag or is there any way to do this?

Dogfight answered 25/11, 2016 at 19:23 Comment(2)
Can you be more specific as it what seems to be not working ? You can add an example to help us understand.Aluminous
@Veeram more details and example added as you requestedDogfight
S
39

Try to replace continueOnError option with ordered and set it to false, accordingly to the documentation, when ordered option is set to false the insertMany will continue writing, even if a single write fails.

Here is the docs link: insertMany

Sterling answered 12/1, 2017 at 11:21 Comment(0)
W
2

2023 answer

A problem with the accepted answer is that the operation will sometimes throw an exception despite it doing what was intended. You then have to determine whether the exception was due to the intended ignoring of existing records, or something else. Not very robust.

I would say a better solution in 2023 (Mongo 4.2+) is to insert the new documents to a different collection if they're not in the DB already, and then use an aggregation pipeline with a final $merge stage with whenMatched: 'keepExisting' to merge your new records into the target collection. This is exactly what $merge is intended for.

The aggregate will either succeed as intended, ignoring existing records in the merge collection, or it will fail with an exception which you can then deal with as a real error.

https://www.mongodb.com/docs/manual/reference/operator/aggregation/merge/#std-label-merge-whenMatched-keepExisting

Warlock answered 10/11, 2023 at 14:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.