Upsert Multiple Records with MongoDb
Asked Answered
I

3

6

I'm trying to get MongoDB to upsert multiple records with the following query, ultimately using MongoMapper and the Mongo ruby driver.

db.foo.update({event_id: { $in: [1,2]}}, {$inc: {visit:1}}, true, true)

This works fine if all the records exist, but does not create new records for records that do not exist. The following command has the desired effect from the shell, but is probably not ideal from the ruby driver.

[1,2].forEach(function(id) {db.foo.update({event_id: id}, {$inc: {visit:1}}, true, true) });

I could loop through each id I want to insert from within ruby, but that would necessitate a trip to the database for each item. Is there a way to upsert multiple items from the ruby driver with only a single trip to the database? What's the best practice here? Using mongomapper and the ruby driver, is there a way to send multiple updates in a single batch, generating something like the following?

db.foo.update({event_id: 1}, {$inc: {visit:1}}, true); db.foo.update({event_id: 2}, {$inc: {visit:1}}, true);

Sample Data:

Desired data after command if two records exist.

{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 11 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 2 }

Actual data after command if two records exist.

{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 11 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 2 }

Desired data after command if only the record with event_id 1 exists.

{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 2 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 1 }

Actual data after command if only the record with event_id 1 exists.

{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 2 }
Intercurrent answered 28/2, 2011 at 15:12 Comment(1)
Not exactly sure what the intended behaviour is. Is it "if there is an event, increase its visit count, otherwise create a new event and set its visit count to 1"? If so then what would the event_id be for newly inserted events?Chainplate
C
2

This - correctly - will not insert any records with event_id 1 or 2 if they do not already exist

db.foo.update({event_id: { $in: [1,2]}}, {$inc: {visit:1}}, true, true)

This is because the objNew part of the query (see http://www.mongodb.org/display/DOCS/Updating#Updating-UpsertswithModifiers) does not have a value for field event_id. As a result, you will need at least X+1 trips to the database, where X is the number of event_ids, to ensure that you insert a record if one does not exist for a particular event_id (the +1 comes from the query above, which increases the visits counter for existing records). To say it in a different way, how does MongoDB know you want to use value 2 for the event_id and not 1? And why not 6?

W.r.t. batch insertion with ruby, I think it is possible as the following link suggests - although I've only used the Java driver: Batch insert/update using Mongoid?

Chainplate answered 28/2, 2011 at 16:9 Comment(1)
To clarify, writes for the bson spec are actually "batchable" by default. Most of the drivers support some form of batch updates / inserts, however the Mongoid is not a driver and may not support all features.Cascara
C
0

What you are after is the Find and Modify command with the upsert option set to true. See the example from the Mongo test suite (same one linked to in the Find and Modify docs) for an example that looks very much like what you describe in your question.

Capp answered 2/3, 2011 at 21:3 Comment(0)
A
-3

I found a way to do this using the eval operator for server-side code execution. Here is the code snippit:

def batchpush(body, item_opts = {})
    @batch << {
        :body => body,
        :duplicate_key => item_opts[:duplicate_key] || Mongo::Dequeue.generate_duplicate_key(body),
        :priority => item_opts[:priority] || @config[:default_priority]
    }
end

def batchprocess()
    js = %Q|
        function(batch) {
            var nowutc = new Date();
            var ret = [];
            for(i in batch){
                e = batch[i];
                //ret.push(e);
                var query = {
                    'duplicate_key': e.duplicate_key,
                    'complete': false,
                    'locked_at': null
                };
                var object = {
                    '$set': {
                        'body': e.body,
                        'inserted_at': nowutc,
                        'complete': false,
                        'locked_till': null,
                        'completed_at': null,
                        'priority': e.priority,
                        'duplicate_key': e.duplicate_key,
                        'completecount': 0
                    },
                    '$inc': {'count': 1}
                };

                db.#{collection.name}.update(query, object, true);
            }
            return ret;
        }
    |
    cmd = BSON::OrderedHash.new
    cmd['$eval'] = js
    cmd['args'] = [@batch]
    cmd['nolock'] = true
    result = collection.db.command(cmd)
    @batch.clear
    #pp result
end

Multiple items are added with batchpush(), and then batchprocess() is called. The data is sent as an array, and the commands are all executed. This code is used in the MongoDequeue GEM, in this file.

Only one request is made, and all the upserts happen server-side.

Arbor answered 29/8, 2011 at 22:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.