How can I merge rankings from several Elasticsearch queries?
Asked Answered
W

3

19

I would like to merge the rankings obtained from querying separate fields of an Elasticsearch index, so to obtain a "compound" ranking.

As a (silly) "matchmaking" example, suppose I wanted to retrieve best-matching results on an index of people containing their favorite music, food, sports.

The separate queries could be e.g.

"query": { "match" : { "music" : "indie classical metal" } }

which would yield me as ranked results:

  1. Alice, 2. Bob, 3. Charlie;

"query": { "match" : { "foods" : "falafel strawberries coffee" } }

yielding

  1. Alice, 2. Charlie, 3. Bob;

and

"query": { "match" : { "sports" : "basketball ski" } }

yielding

  1. Charlie, 2. Alice, 3. Bob.

Now, I would like to obtained an "aggregate" ranking based on the rankings above, e.g. using the voting methods listed in How to merge a collection of ordered preferences.

So far, to achieve something along these lines I used syntax for compound queries such as

"query": {
   "bool": {
        "should": [
                { "match" : { "music" : "indie classical metal" } },
                { "match" : { "foods" : "falafel strawberries coffee" } },
                { "match" : { "sports" : "basketball ski" } },
        ]
    }
 }

or

"query": {
   "dis_max": {
        "queries": [
                { "match" : { "music" : "indie classical metal" } },
                { "match" : { "foods" : "falafel strawberries coffee" } },
                { "match" : { "sports" : "basketball ski" } },
        ]
    }
 }

but (AFAIK) these don't do what I am looking for (which is not using scores, but ranks). I understand that's fairly straightforward to post-process the rankings (e.g. using elasticsearch-py and then a few Python lines), but is it possible to do the things above directly with an Elasticsearch query?

(bonus question: could you suggest alternative strategies to merge rankings from multiple fields, beyond bool+should and dis_max that I could try out?)

Wurth answered 18/5, 2018 at 13:4 Comment(0)
S
1

Answer #1. The Bool Query + the Terms Set +the Similarity Function

The first alternative strategy is to overload the similarity function

We introduce a following document score (people ranking) model

Let A is a term set in a document field , B is a term set in a query to the field

Let fieldScore = f(A, B) = |A ∩ B|

Let documentScore = fieldScore1 * fieldBoost1 + fieldScore2 * fieldBoost2 + ...

Mapping

PUT /ranking_people_similarity
{
    "settings": {
        "similarity": {
            "matched_term_count": {
                "type": "scripted",
                "script": {
                    "source": "doc.freq > 0 ? 1 : 0"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "keyword"
            },
            "music": {
                "type": "keyword",
                "similarity": "matched_term_count"
            },
            "foods": {
                "type": "keyword",
                "similarity": "matched_term_count"
            },
            "sports": {
                "type": "keyword",
                "similarity": "matched_term_count"
            }
        }
    }
}

Documents

PUT /ranking_people_similarity/_bulk
{"create":{"_id":1}}
{"name":"Alice","music":["indie","classical","metal"],"foods":["falafel","strawberries","coffee"],"sports":["basketball","ski"]}
{"create":{"_id":2}}
{"name":"Bob","music":["indie","metal"],"foods":["coffee"],"sports":["basketball"]}
{"create":{"_id":3}}
{"name":"Charlie","music":["classical"],"foods":["falafel","coffee"],"sports":["hockey","basketball","ski"]}

Ranking query with custom similarity function

GET /ranking_people_similarity/_search?filter_path=hits.hits
{
    "query": {
        "bool": {
            "should": [
                {
                    "terms_set": {
                        "music": {
                            "terms": [
                                "indie",
                                "classical"
                            ],
                            "minimum_should_match_script": {
                                "source": "1"
                            },
                            "boost": 1
                        }
                    }
                },
                {
                    "terms_set": {
                        "foods": {
                            "terms": [
                                "strawberries",
                                "coffee"
                            ],
                            "minimum_should_match_script": {
                                "source": "1"
                            },
                            "boost": 1
                        }
                    }
                },
                {
                    "terms_set": {
                        "sports": {
                            "terms": [
                                "hockey",
                                "basketball"
                            ],
                            "minimum_should_match_script": {
                                "source": "1"
                            },
                            "boost": 1
                        }
                    }
                }
            ]
        }
    },
    "fields": [
        "name"
    ],
    "_source": false
}

Response

{
    "hits" : {
        "hits" : [
            {
                "_index" : "ranking_people_similarity",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 5.0,
                "fields" : {
                    "name" : [
                        "Alice"
                    ]
                }
            },
            {
                "_index" : "ranking_people_similarity",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 4.0,
                "fields" : {
                    "name" : [
                        "Charlie"
                    ]
                }
            },
            {
                "_index" : "ranking_people_similarity",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 3.0,
                "fields" : {
                    "name" : [
                        "Bob"
                    ]
                }
            }
        ]
    }
}
Shiny answered 21/2 at 12:52 Comment(0)
S
1

Answer #2. Pure Scripting

(See the document score model and the first strategy in Answer #1)

The second strategy is pure scripting

Mapping

PUT /ranking_people_scripted
{
    "mappings": {
        "properties": {
            "name": {
                "type": "keyword"
            },
            "music": {
                "type": "keyword"
            },
            "foods": {
                "type": "keyword"
            },
            "sports": {
                "type": "keyword"
            }
        }
    }
}

Documents (see Answer #1)

Ranking scripted query

GET /ranking_people_scripted/_search?filter_path=hits.hits
{
    "query": {
        "script_score": {
            "query": {
                "match_all": {}
            },
            "script": {
                "source": """
                    int calculateFieldScore(List fieldTerms, List queryTerms) {
                        def fieldScore = 0;
                        for (def queryTerm : queryTerms) {
                            if (fieldTerms.contains(queryTerm)) {
                                fieldScore++;
                            }
                        }
                        return fieldScore;
                    }
                    
                    def documentScore = 0;
                    def termSets = params.term_sets;
                    
                    for (def termSet : termSets) {
                        def queryTerms = termSet.terms;
                        def field = termSet.field;
                        def fieldBoost = termSet.boost;
                        def fieldTerms = doc[field];
                        
                        int fieldScore = calculateFieldScore(fieldTerms, queryTerms);
                        
                        documentScore += fieldScore * fieldBoost;
                    }
                    return documentScore;
                """,
                "params": {
                    "term_sets": [
                        {
                            "terms": [
                                "indie",
                                "classical"
                            ],
                            "field": "music",
                            "boost": 1
                        },
                        {
                            "terms": [
                                "strawberries",
                                "coffee"
                            ],
                            "field": "foods",
                            "boost": 1
                        },
                        {
                            "terms": [
                                "hockey",
                                "basketball"
                            ],
                            "field": "sports",
                            "boost": 1
                        }
                    ]
                }
            }
        }
    },
    "fields": [
        "name"
    ],
    "_source": false
}

Response

{
    "hits" : {
        "hits" : [
            {
                "_index" : "ranking_people_scripted",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 5.0,
                "fields" : {
                    "name" : [
                        "Alice"
                    ]
                }
            },
            {
                "_index" : "ranking_people_scripted",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 4.0,
                "fields" : {
                    "name" : [
                        "Charlie"
                    ]
                }
            },
            {
                "_index" : "ranking_people_scripted",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 3.0,
                "fields" : {
                    "name" : [
                        "Bob"
                    ]
                }
            }
        ]
    }
}

You also could script a runtime field or a script query

Shiny answered 21/2 at 13:25 Comment(0)
O
-1

Have a look at Function Score Query - it should allow you to do what you’re looking for. But be aware that it might result in slower query execution.

Overzealous answered 20/5, 2018 at 18:23 Comment(1)
You should probably provide a sample function_score query based on the OP's needs above.Epistasis

© 2022 - 2024 — McMap. All rights reserved.