How safe is MongoDB's safe mode on inserts?
Asked Answered
S

2

14

I am working on a project which has some important data in it. This means we cannot to lose any of it if the light or server goes down. We are using MongoDB for the database. I'd like to be sure that my data is in the database after the insert and rollback the whole batch if one element was not inserted. I know it is the philosophy behind Mongo that we do not need transactions but how can I make sure that my data is really safely stored after insert rather than sent to some "black hole".

  • Should I make a search?

  • Should I use some specific mongoDB commands?

  • Should I use sharding even if one server is enough for satisfying
    the speed and by the way it doesn't guarantee anything if the light
    goes down?

What is the best solution?

Shoveler answered 11/8, 2011 at 13:51 Comment(0)
A
14

Your best bet is to use Write Concerns - these allow you to tell MongoDB how important a piece of data is. The quickest Write Concern is also the least safe - the data is not flushed to disk until the next scheduled flush. The safest will confirm that the data has been written to disk on a number of machines before returning.

The write concern you are looking for is FSYNC_SAFE (at least that is what it is called from the point of view of the Java driver) or REPLICAS_SAFE which confirms that your data has been replicated.

Bear in mind that MongoDB does not have transactions in the traditional sense - your rollback will have to be rolled by hand as you can't tell the Mongo database to do this for you.

The other thing you need to do is either use the relatively new --journal option (which uses a Write Ahead Log), or use replica sets to share your data across many machines in order to maximise data integrity in the event of a crash/power loss.

Sharding is not so much a protection against hardware failure as a method for sharing the load when dealing with particularly large datasets - sharding shouldn't be confused with replica sets which is a way of writing data to more than one disk on more than one machine.

Therefore, if your data is valuable enough, you should definitely be using replica sets, perhaps even siting slaves in other data centres/availability zones/racks/etc in order to provide the resilience you require.

There is/will be (can't remember offhand whether this has been implemented yet) a way to specify the priority of individual nodes in a replica set such that if the master goes down the new master that is elected is one in the same data centre if such a machine is available (ie to stop a slave on the other side of the country from becoming master unless it really is the only other option).

Archibold answered 11/8, 2011 at 13:58 Comment(2)
Thank you for a really good and vast reply. I will wait for other replies for sometime and if I do not find anything new I'll accept it, thanksShoveler
tl;dr: You're screwed any which way!Bennie
S
5

I received a really nice answer from a person called GVP on google groups. I will quote it(basically it adds up to Rich's answer):

I'd like to be sure that my data is in the database after the insert and rollback the whole batch if one element was not inserted.

This is a complex topic and there are several trade-offs you have to consider here.

Should I use sharding?

Sharding is for scaling writes. For data safety, you want to look a replica sets.

Should I use some specific mongoDB commands?

First thing to consider is "safe" mode or "getLastError()" as indicated by Andreas. If you issue a "safe" write, you know that the database has received the insert and applied the write. However, MongoDB only flushes to disk every 60 seconds, so the server can fail without the data on disk.

Second thing to consider is "journaling" (v1.8+). With journaling turned on, data is flushed to the journal every 100ms. So you have a smaller window of time before failure. The drivers have an "fsync" option (check that name) that goes one step further than "safe", it waits for acknowledgement that the data has be flushed to the disk (i.e. the journal file). However, this only covers one server. What happens if the hard drive on the server just dies? Well you need a second copy.

Third thing to consider is replication. The drivers support a "W" parameter that says "replicate this data to N nodes" before returning. If the write does not reach "N" nodes before a certain timeout, then the write fails (exception is thrown). However, you have to configure "W" correctly based on the number of nodes in your replica set. Again, because a hard drive could fail, even with journaling, you'll want to look at replication. Then there's replication across data centers which is too long to get into here. The last thing to consider is your requirement to "roll back". From my understanding, MongoDB does not have this "roll back" capacity. If you're doing a batch insert the best you'll get is an indication of which elements failed.

Here's a link to the PHP driver on this one: http://it.php.net/manual/en/mongocollection.batchinsert.php You'll have to check the details on replication and the W parameter. I believe the same limitations apply here.

Shoveler answered 12/8, 2011 at 6:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.