Every day, I receive a stock of documents (an update). What I want to do is insert each item that does not already exist.
- I also want to keep track of the first time I inserted them, and the last time I saw them in an update.
- I don't want to have duplicate documents.
- I don't want to remove a document which has previously been saved, but is not in my update.
- 95% (estimated) of the records are unmodified from day to day.
I am using the Python driver (pymongo).
What I currently do is (pseudo-code):
for each document in update:
existing_document = collection.find_one(document)
if not existing_document:
document['insertion_date'] = now
else:
document = existing_document
document['last_update_date'] = now
my_collection.save(document)
My problem is that it is very slow (40 mins for less than 100 000 records, and I have millions of them in the update). I am pretty sure there is something builtin for doing this, but the document for update() is mmmhhh.... a bit terse.... (http://www.mongodb.org/display/DOCS/Updating )
Can someone advise how to do it faster?