Endless recovering state of secondary
Asked Answered
H

3

10

I build a replication set with one primary, one secondary and one arbiter on MongoDB 3.0.2. The primary and arbiter are on the same host and the secondary is on another host.

With the growing of write overload, the secondary can't follow the primary and step into the state of recovering. The primary can connect to the secondary as I can log to the secondary server by Mongo shell on the host of primary.

I stop all the operations and watch the secondary's state with the command rs.status() and type the command rs.syncFrom("primary's ip:port") on secondary.

Then the result of the rs.status() command shows that the optimeDate of secondary is far behind that of the primary and one message appears intermittently as below:

"set" : "shard01", "date" : ISODate("2015-05-15T02:10:55.382Z"), "myState" : 3, "members" : [ { "_id" : 0, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 135364, "optime" : Timestamp(1431655856, 6), "optimeDate" : ISODate("2015-05-15T02:10:56Z"), "lastHeartbeat" : ISODate("2015-05-15T02:10:54.306Z"), "lastHeartbeatRecv" : ISODate("2015-05-15T02:10:53.634Z"), "pingMs" : 0, "electionTime" : Timestamp(1431520398, 2), "electionDate" : ISODate("2015-05-13T12:33:18Z"), "configVersion" : 3 }, { "_id" : 1, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 135364, "lastHeartbeat" : ISODate("2015-05-15T02:10:53.919Z"), "lastHeartbeatRecv" : ISODate("2015-05-15T02:10:54.076Z"), "pingMs" : 0, "configVersion" : 3 }, { "_id" : 2, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 3, "stateStr" : "RECOVERING", "uptime" : 135510, "optime" : Timestamp(1431602631, 134), "optimeDate" : ISODate("2015-05-14T11:23:51Z"), "infoMessage" : "could not find member to sync from", "configVersion" : 3, "self" : true } ], "ok" : 1

"infoMessage" : "could not find member to sync from"

The primary and arbiter are both OK. I want to know the reason of this message and how to change the secondary's state from "recovering" to "secondary".

Heavyladen answered 15/5, 2015 at 3:4 Comment(3)
Please post the output of rs.status() and try to connect from three secondary in question to the configured port for mongod on the primary.Pneumatics
I can't attach a picture and the rs.status() output likes this: "stateStr" : "PRIMARY", "optimeDate" : ISODate("2015-05-15T06:32:52Z")," "stateStr" : "RECOVERING", "optimeDate" : ISODate("2015-05-14T11:23:51Z")Heavyladen
Please edit your question and place the output of rs.status() there so that people can get more info about your question easier. And I agree with @MarkusWMahlberg, you should firstly make sure there are no connection issues between primary and secondary.Lichen
P
13

The problem (most likely)

The last operation on the primary is from "2015-05-15T02:10:56Z", whereas the last operation of the going to be secondary is from "2015-05-14T11:23:51Z", which is a difference of roughly 15 hours. That window may well exceed your replication oplog window (the difference between the time of the first and the last operation entry in your oplog). Put simply, there are too many operations on the primary for the secondary to catch up.

A bit more elaborated (though simplified): during an initial sync, the data the secondary syncs from is the data of a given point in time. When the data of that point in time is synced over, the secondary connects to the oplog and applies the changes that were made between said point in time and now according to the oplog entries. This works well as long as the oplog holds all operations between the mentioned point in time. But the oplog has a limited size (it is a so called capped collection). So if there are more operations happening on the primary than the oplog can hold during the initial sync, the oldest operations "fade out". The secondary recognises that not all operations are available necessary to "construct" the same data as the primary and refuses to complete the sync, staying in RECOVERY mode.

The solution(s)

The problem is a known one and not a bug, but a result of the inner workings of MongoDB and several fail-safe assumptions made by the development team. Hence, there are several ways to deal with the situation. Sadly, since you only have two data bearing nodes, all involve downtime.

Option 1: Increase the oplog size

This is my preferred method, since it deals with the problem once and (kind of) for all. It's a bit more complicated than other solutions, though. From a high level perspective, these are the steps you take.

  1. Shut down the primary
  2. Create a backup of the oplog using direct access to the data files
  3. Restart the mongod in standalone mode
  4. Copy the current oplog to a temporary collection
  5. Delete the current oplog
  6. Recreate the oplog with the desired size
  7. Copy back the oplog entries from the temporary collection to the shiny new oplog
  8. Restart mongod as part of the replica set

Do not forget to increase the oplog of the secondary before doing the initial sync, since it may become primary at some time in the future!

For details, please read "Change the size of the oplog" in the tutorials regarding replica set maintenance.

Option 2: Shut down the app during sync

If option 1 is not viable, the only real other solution is to shut down the application causing load on the replica set, restart the sync and wait for it too complete. Depending on the amount of the data to be transferred, calculate with several hours.

A personal note

The oplog window problem is a well known one. While replica sets and sharded clusters are easy to set up with MongoDB, quite some knowledge and a bit of experience is needed to maintain them properly. Do not run something as important as a database with a complex setup without knowing the basics - in case Something Bad (tm) happens, it might well lead to a situation FUBAR.

Pneumatics answered 15/5, 2015 at 10:51 Comment(5)
Thanks for Mahlberg and I will try your methods.Heavyladen
You save my day! I set my oplog size to 1MB, it's only can holds operation duration for 3 hours. Now, I need to set it to 30MB. (My nodes are in the same data center, so laggy is not a big problem :) )Hexagram
According to the current (Jul 2018) version of docs.mongodb.com/manual/tutorial/change-oplog-size, it seems you don't really need to shutdown the nodes to adjust the size of oplog.Mindszenty
@Mindszenty This is not true for the version of OP: docs.mongodb.com/v3.0/tutorial/change-oplog-sizePneumatics
@Markus W Mahlberg Sure. I should have said 'no need to shutdown the nodes in the newer versions of MongoDB'. This page is the first hit when searching for the specific sync problem, might as well make it helpful for people visiting in 2019.Mindszenty
S
9

Another option (assuming primary has healthy data) is to simply delete the data in the secondary's mongo data folder and restart. This will cause it to sync back up to the primary as if you just added it to the replica set.

Secundines answered 4/2, 2017 at 7:6 Comment(1)
If data is more , it may show 2-3 years also to sync from primary to secondary.Chinchin
A
0

Add a fourth new node to the replica set. Once it has synced then reset the stale secondary.

Amazement answered 1/3, 2019 at 20:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.