Mongos routing with ReadPreference=NEAREST

Asked 2/9, 2015 at 21:54 Answered 21/6, 2021 at 15:59

Solved mongodb spring-data-mongodb mongodb-java

I'm having trouble diagnosing an issue where my Java application's requests to MongoDB are not getting routed to the Nearest replica, and I hope someone can help. Let me start by explaining my configuration.

The Configuration:

I am running a MongoDB instance in production that is a Sharded ReplicaSet. It is currently only a single shard (it hasn't gotten big enough yet to require a split). This single shard is backed by a 3-node replica set. 2 nodes of the replica set live in our primary data center. The 3rd node lives in our secondary datacenter, and is prohibited from becoming the Master node.

We run our production application simultaneously in both data centers, however the instance in our secondary data center operates in "read-only" mode and never writes data into MongoDB. It only serves client requests for reads of existing data. The objective of this configuration is to ensure that if our primary datacenter goes down, we can still serve client read traffic.

We don't want to waste all of this hardware in our secondary datacenter, so even in happy times we actively load balance a portion of our read-only traffic to the instance of our application running in the secondary datacenter. This application instance is configured with readPreference=NEAREST and is pointed at a mongos instance running on localhost (version 2.6.7). The mongos instance is obviously configured to point at our 3-node replica set.

From a mongos:

mongos> sh.status()
--- Sharding Status --- 
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("52a8932af72e9bf3caad17b5")
}
shards:
{  "_id" : "shard1",  "host" : "shard1/failover1.com:27028,primary1.com:27028,primary2.com:27028" }
databases:
{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
{  "_id" : "test",  "partitioned" : false,  "primary" : "shard1" }
{  "_id" : "MyApplicationData",  "partitioned" : false,  "primary" : "shard1" }

From the failover node of the replicaset:

shard1:SECONDARY> rs.status()
{
"set" : "shard1",
"date" : ISODate("2015-09-03T13:26:18Z"),
"myState" : 2,
"syncingTo" : "primary1.com:27028",
"members" : [
{
    "_id" : 3,
    "name" : "primary1.com:27028",
    "health" : 1,
    "state" : 1,
    "stateStr" : "PRIMARY",
    "uptime" : 674841,
    "optime" : Timestamp(1441286776, 2),
    "optimeDate" : ISODate("2015-09-03T13:26:16Z"),
    "lastHeartbeat" : ISODate("2015-09-03T13:26:16Z"),
    "lastHeartbeatRecv" : ISODate("2015-09-03T13:26:18Z"),
    "pingMs" : 49,
    "electionTime" : Timestamp(1433952764, 1),
    "electionDate" : ISODate("2015-06-10T16:12:44Z")
},
{
    "_id" : 4,
    "name" : "primary2.com:27028",
    "health" : 1,
    "state" : 2,
    "stateStr" : "SECONDARY",
    "uptime" : 674846,
    "optime" : Timestamp(1441286777, 4),
    "optimeDate" : ISODate("2015-09-03T13:26:17Z"),
    "lastHeartbeat" : ISODate("2015-09-03T13:26:18Z"),
    "lastHeartbeatRecv" : ISODate("2015-09-03T13:26:18Z"),
    "pingMs" : 53,
    "syncingTo" : "primary1.com:27028"
},
{
    "_id" : 5,
    "name" : "failover1.com:27028",
    "health" : 1,
    "state" : 2,
    "stateStr" : "SECONDARY",
    "uptime" : 8629159,
    "optime" : Timestamp(1441286778, 1),
    "optimeDate" : ISODate("2015-09-03T13:26:18Z"),
    "self" : true
}
],
"ok" : 1
}


shard1:SECONDARY> rs.conf()
{
    "_id" : "shard1",
    "version" : 15,
    "members" : [
    {
        "_id" : 3,
        "host" : "primary1.com:27028",
        "tags" : {
            "dc" : "primary"
        }
    },
    {
        "_id" : 4,
        "host" : "primary2.com:27028",
        "tags" : {
            "dc" : "primary"
        }
    },
    {
        "_id" : 5,
        "host" : "failover1.com:27028",
        "priority" : 0,
        "tags" : {
            "dc" : "failover"
        }
    }
    ],
    "settings" : {
        "getLastErrorModes" : {"ACKNOWLEDGED" : {}}
    }
}

The Problem:

The problem is that requests which hit this mongos in our secondary datacenter seem to be getting routed to a replica running in our primary datacenter, not the nearest node, which is running in the secondary datacenter. This incurs a significant amount of network latency and results in bad read performance.

My understanding is that the mongos is deciding which node in the replica set to route the request to, and it's supposed to honor the ReadPreference from my java driver's request. Is there a command I can run in the mongos shell to see the status of the replica set, including ping times to nodes? Or some way to see logging of incoming requests which indicates the node in the replicaSet that was chosen and why? Any advice at all on how to diagnose the root cause of my issue?

Royall answered 2/9, 2015 at 21:54 Comment(3)

How exactly do you prevent the server in the second data center to become primary? Could you post the output of sh.status() and rs.status()? – Mosenthal 2/9, 2015 at 23:59

You prevent a node from becoming primary with priority=0 docs.mongodb.org/master/core/replica-set-priority-0-member/… – Royall 3/9, 2015 at 13:13

There are several ways of preventing a node to become primary, just wanted to make sure it wasn't hidden. ;) Interesting problem... – Mosenthal 3/9, 2015 at 14:10

If I start mongos with flag -vvvv (4x verbose) then I am presented with request routing information in the log files, including information about the read preference used and the host to which requests were routed. for example:

2015-09-10T17:17:28.020+0000 [conn3] dbclient_rs say 
using secondary or tagged node selection in shard1, 
read pref is { pref: "nearest", tags: [ {} ] } 
    (primary : primary1.com:27028, 
    lastTagged : failover1.com:27028)

Royall answered 11/9, 2015 at 13:55 Comment(0)

While configuring read preference, when ReadPreference = NEAREST the system does not look for minimum network latency as it may decide primary as the nearest, if the network connection is proper. However, the nearest read mode, when combined with a tag set, selects the matching member with the lowest network latency. Even nearest may be any of primary or secondary. Behaviour of mongos when preferences configured , and in terms of network latency is not so clearly explained in the official docs.

http://docs.mongodb.org/manual/core/read-preference/#replica-set-read-preference-tag-sets

hope this helps

Obscuration answered 11/9, 2015 at 4:26 Comment(0)

2015-09-10T17:17:28.020+0000 [conn3] dbclient_rs say 
using secondary or tagged node selection in shard1, 
read pref is { pref: "nearest", tags: [ {} ] } 
    (primary : primary1.com:27028, 
    lastTagged : failover1.com:27028)

Royall answered 11/9, 2015 at 13:55 Comment(0)

Despite the wording, when using nearest, the absolute fastest member isn't necessarily the one chosen. Instead, a random member is chosen out of a pool of members that have a latency lower than the calculated latency window.

The latency window is calculated by taking the fastest member's ping and adding replication.localPingThresholdMs, whose default is 15ms. You can read more about the algorithm here.

So what I do is I combine nearest with tags so that I can specify the member manually that I know is geographically closest.

Snowinsummer answered 21/6, 2021 at 15:59 Comment(0)

Recommended topics

Hot tags