I'm trying to find a way to prevent multiple posts from appearing in search results that are from the same author. So far I've tried random scoring, which allows me to maintain pagination. However, I can still have up to 4 of the same authors in a given page of 10 results.
Is there any way to score a document based on how many times a certain field occurs in the result set? As far as I'm aware you cannot persist a variable or object in a scoring script.
I've looked into several methods of accomplishing this, but many of them have quite a few cons. Such as removing the duplicates, and calling again to retrieve a new set of results which have the current authors excluded. However this can also return multiple of the same authors. So I'm left to query one by one to replace duplicate authors in a result set, and this breaks deep pagination because eventually the other result set which is used to replace duplicates runs out of pages before the standard search. I've also tried aggregation which is not page-able.
Is there any functionality to spread out or subtract the score of a document based on how many times a document of the same author(or field) occurs?
from
andsize
for a bucket, but I cannot do that on a set of buckets. – Marva