I am using Azure Search indexing for creating a faceted search of products. I have around 5 facets to aid in filtering the list of displayed products.
One thing I have noticed is that if there are quite a lot of products listed for filtering down using facets, smaller search items that belong within a facet do not get returned from the index.
For example (in its simplicity), if my index had the following cars manufacturers listed within a facet:
- Audi (312)
- BMW (203)
- Volvo (198)
- Skoda (4)
I would find that Skoda would not get returned, since there is such a small amount of search results linked to that manufacturer.
I can see this is the case when I search the index directly within the Azure Portal by using this query: facet=<facet-field-name>
After some research I came across the following explanation:
Facet counts can be inaccurate due to the sharding architecture. Every search index has multiple shards, and each shard reports the top N facets by document count, which is then combined into a single result. If some shards have many matching values, while others have fewer, you may find that some facet values are missing or under-counted in the results.
Although this behavior could change at any time, if you encounter this behavior today, you can work around it by artificially inflating the count: to a large number to enforce full reporting from each shard. If the value of count: is greater than or equal to the number of unique values in the field, you are guaranteed accurate results. However, when document counts are high, there is a performance penalty, so use this option judiciously.
Based on the above quote, how do I artificially inflate the count to get around this issue? Or does anyone know a better approach?