When I stop nodes of my replica set and start them up again, the primary node goes into status "recovering".
I have a replica set created, running without authorization. In order to use authorization I have added users "db.createUser(...)", and enabled authorization in the configuration file:
security:
authorization: "enabled"
Before stopping replica set (even restarting cluster without adding security params), rs.status() shows:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T09:57:50.335Z"),
"myState" : 1,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 301,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"electionTime" : Timestamp(1473328390, 1),
"electionDate" : ISODate("2016-09-08T09:53:10Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.679Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.676Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.167:27017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.680Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.054Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.168:27017",
"configVersion" : 1
}
],
"ok" : 1
}
In order to start using this configuration, I have stopped each node as follows:
[root@n--- etc]# mongo --port 27017 --eval 'db.adminCommand("shutdown")'
MongoDB shell version: 3.2.9
connecting to: 127.0.0.1:27017/test
2016-09-02T14:26:15.784+0200 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2016-09-02T14:26:15.785+0200 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:231:14
After this shutdown, I have confirmed that the process does not exist by checking the output from ps -ax | grep mongo
.
But when I start the nodes again and log in with my credentials, rs.status() indicates now:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T13:19:12.963Z"),
"myState" : 3,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 42,
"optime" : {
"ts" : Timestamp(1473340490, 6),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T13:14:50Z"),
"infoMessage" : "could not find member to sync from",
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.553Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.552Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
}
],
"ok" : 1
}
Why? Perhaps the shutdown is not a good way to stop mongod; however I also tested using 'kill pid', but the restart ends up in the same state.
In this status I don´t know how to repair the cluster; I have started again (removing the dbpath files and reconfiguring the replica set); I tried '--repair' but has not worked.
Info about my system:
- Mongo version: 3.2
- I start the process as root, perhaps it should be as 'mongod' user?
- This is my start command:
mongod --conf /etc/mongod.conf
- keyFile configuration does not work; if I add "--keyFile /path/to/file" shows:
"about to fork child process, waiting until server is ready for connections." this file has all permissions, but it cannot use keyFile. An example of the "net.bindIp" configuration, from mongod.conf on one machine:
net: port: 27017 bindIp: 127.0.0.1,192.168.1.167
--keyFile /path/to/keyfile
arg. – Nebula