Why is full text search of MongoDB shards directly much faster than going through the cluster manager (mongos) instance?

db.items.aggregate( [ { "$match" : { "$text" : { "$search" : "search terms"} } }, { "$project": { "type_id" : 1, "source_id": 1 } }, { "$facet" : { "types" : [ { "$unwind" : "$type_id"} , { "$sortByCount" : "$type_id"}] , "sources" : [ { "$unwind" : "$source_id"} , { "$sortByCount" : "$source_id"}]}} ] );

Can someone confirm these queries to shards are being run serially? Or offer some other explanation?

Without a shard key in the query, the query is sent to all shards and processed in parallel. However, the results from all shards will be merged at the primary shard, and thus it'll wait until the slowest shard returns.

What are the pitfalls to querying the shards directly?

You can potentially include orphaned documents. Query via mongos also checks orphaned documents to ensure data consistency. Therefore, querying via mongos has more overhead than querying directly from each shard.

Measured using Robo 3T's query time

Using Robo 3T doesn't measure the query time correctly. By default, Robo 3T returns first 50 documents. For driver implementations, if the number of returned documents is more than the default batch size, to retrieve the all docs, there will be getmore requests followed to database. Robo 3T only gives you the first batch, i.e. a subset of results.

To evaluate your query, add explain('executionStats') to your query. The performance hit is likely the data transfer between shards. Because the lacking of a shard key in the query, the results of all shards have to be sent to a shard before merging. The total time is not only the query time (locating the docs) from mongo engine, but also documents retrieval time.

Execute the command below and you'll see inputStages from each shard to better evaluate your query.

db.items.explain('executionStats').aggregate(
[
   { "$match" : {
    "$text" : { "$search" : "search terms"}
      }
   }, 
   { "$project": { "type_id" : 1, "source_id": 1 } },
   { "$facet" : { "types" : [ { "$unwind" : "$type_id"} , { "$sortByCount" : "$type_id"}] , "sources" : [ { "$unwind" : "$source_id"} , { "$sortByCount" : "$source_id"}]}}
]
);

Recommended topics

Hot tags