I have been very unhappy with full text search performance in MongoDB so I have been looking for outside-the-box solutions. With a relatively small collection of 25 million documents sharded across 8 beefy machines (4 shards with redundancy) I see some queries taking 10 seconds. That is awful. On a lark, I tried a 10 second query to the shards directly, and it seems like the mongos is sending the queries to shards serially, rather than in parallel. Across the 4 shards I saw search times of 2.5 seconds on one shard and the other 3 shards under 2 seconds each. That is a total of less than 8.5 seconds, but it took 10 through mongos. Facepalm.
Can someone confirm these queries to shards are being run serially? Or offer some other explanation?
What are the pitfalls to querying the shards directly?
We are on 4.0 and the query looks like this:
db.items.aggregate(
[
{ "$match" : {
"$text" : { "$search" : "search terms"}
}
},
{ "$project": { "type_id" : 1, "source_id": 1 } },
{ "$facet" : { "types" : [ { "$unwind" : "$type_id"} , { "$sortByCount" : "$type_id"}] , "sources" : [ { "$unwind" : "$source_id"} , { "$sortByCount" : "$source_id"}]}}
]
);
I made a mistake before, this is the query being sent that has the issue. And I talked to a MongoDB expert and was clued into a big part of what's going on (I think), but happy to see what others have to say so I can pay the bounty and make it official.