How do scoring profiles generate scores in Azure Search?
Asked Answered
M

2

10

I want to add a scoring profile on my index on Azure Search. More specifically, every document in my index has a weight field of type Edm.Double, and I want to boost them according to this value. I don't want to just directly sort them with respect to weight because the relevance of the search term is also important.

So just to test it out, I created a scoring profile with a magnitude function with boost value 1000 (just to see if I got how this thing works), linear interpolation, starting value 0 and ending value 1. What I was expecting was the boost value to be added to the overall search score. So a document with weight 0.5 would get a boost of 500, whereas a document with weight 0.125 would get a boost of 125. However, the resulting scores were nowhere near this much intuitive.

I have a couple of questions in this case:

1) How is the function score generated in this case? I have documents with weights close to each other(let's say 0.5465 and 0.5419), but the differences between their final scores is around 100-150, whereas I would expect it to be around 4-5.

2) How are function scores and weights aggregated into a final score for each search result?

Mckamey answered 2/1, 2017 at 13:40 Comment(0)
N
6

Thanks for the providing the details. What were the base relevance scores of the two documents?

The boosting factor provided in the scoring profile is actually multiplied to the base relevance scores computed using term frequencies. For example, suppose that the base scores, given in @search.score in the response payload, of the two documents were 0.5 and 0.2 and the values in the weight column were 0.5465 and 0.5419 respectively. With the scoring profile configuration given above, with starting value of 0, ending value of 1, linear interpolation, and the boost factor of 1000. The final score you get for each document is computed as the following :

document 1 : base search_score(0.5) * boost_factor (1000) * (weight (0.5465) - min(0)) / max - min (1) = final_search_score(273.25)

document 2 : base_search_score(0.2) * boost_factor (1000) * (weight (0.5419) - min(0)) / max - min (1) = final_search_score(108.38)

Please let me know if the final scores you get do not agree with the function above. Thanks!

Nate

Nerty answered 4/1, 2017 at 0:47 Comment(4)
Thanks Nate, this perfectly answers my question. One clarification I need though is the use of weights for each field. If I have weights on my fields, the base score is calculated first with these weights, then using this base score, the formulation you explained is calculated, is that correct?Mckamey
If you are referring to field weights in scoring profile, yes the field weight is already factored in in the base score, then additional boosting is applied.Nerty
Thanks again Nate. Also one last thing. Does a document with the minimum value in the scoring function always get a final score of 0? In my case weight - min may be 0, however in that case I don't want this document to be deep down in the search results, I just don't want it to be boosted and keep its original base score. So rather than multiplying the interpolated boost factor with the base score, I want to add them. Is there a way to do this? Or any other solution to this problem?Mckamey
@NateKo - Coming from elasticsearch background, trying my hands on Azure search. I was playing with scoring profile function however not getting score as per equation mentioned above. In my case, I have 0.71231794 as base_search_score, boost_factor = 2, weight(field value) = 3, min = 0 and max = 5. So, by above function new score should be (0.71231794×2×(3−0))÷5 = 0.854781528. But the final score which I get as a response is 1.1397088. Am I missing something or equation has changed in latest version?Geezer
C
9

So the provided answer by Nate is difficult to understand and it misses some components. I have made an overview of the entire scoring process, and its quite complex.

So when an user executes a search a query is given to Azure Search. Azure search uses the TF-IDF algorithm to determine a score from 0-1 based on Tokens being formed by the Analyzer. Keep in mind that language specific analyzers can come up with multiple tokens for one word. For every searchable field the score will be produced and then multiplied by the weight in the scoring profile. Lastly all weighted scores will be summed up and that's the initial weighted score.

A scoring profile might also contain scoring functions. The scoring function can be either a magnitude, freshness, geo or tag based function. Multiple functions can be made within one scoring profile.

The functions will be evaluated and the score from the functions can be either summed up, or taken the average, minimum, maximum or first matching. The total of all functions is then multiplied by the total weighted score and that's the final score.

An example, this is an example index with scoring profile.

{  
  "name": "musicstoreindex",  
  "fields": [  
    { "name": "key", "type": "Edm.String", "key": true },  
    { "name": "albumTitle", "type": "Edm.String" },  
    { "name": "genre", "type": "Edm.String" },  
    { "name": "genreDescription", "type": "Edm.String", "filterable": false },  
    { "name": "artistName", "type": "Edm.String" },  
    { "name": "rating", "type": "Edm.Int32" },  
    { "name": "price", "type": "Edm.Double", "filterable": false },  
    { "name": "lastUpdated", "type": "Edm.DateTimeOffset" }  
  ],  
  "scoringProfiles": [  
    {  
      "name": "boostGenre",  
      "text": {  
        "weights": {  
          "albumTitle": 1.5,  
          "genre": 5,  
          "artistName": 2  
        }  
      }  
    },  
    {  
      "name": "newAndHighlyRated",  
      "functions": [  
        {  
          "type": "freshness",  
          "fieldName": "lastUpdated",  
          "boost": 10,  
          "interpolation": "linear",  
          "freshness": {  
            "boostingDuration": "P365D"  
          }  
        },  
        {
          "type": "magnitude",  
          "fieldName": "rating",  
          "boost": 8,  
          "interpolation": "linear",  
          "magnitude": {  
            "boostingRangeStart": 1,  
            "boostingRangeEnd": 5,  
            "constantBoostBeyondRange": false  
          }  
        }  
      ],
      "functionAggregation": 0
    }  
  ]
}

Lets say the entered query is meteora the famous album by Linkin Park. Lets say we have the following document in our index.

{
    "key": 123,
    "albumTitle": "Meteora",
    "genre": "Rock",
    "genreDescription": "Rock with a flick of hiphop",
    "artistName": "Linkin Park",
    "rating": 4,
    "price": 30,
    "lastUpdated": "2020-01-01" 
}

I'm not an expert on TF-IDF but I can imagine that the following unweighted score will be produced:

{
    "albumTitle": 1,
    "genre": 0,
    "genreDescription": 0,
    "artistName": 0
}

The scoring profile has a weight of 1.5 on the albumTitle field, so the total weighted score will be: 1 * 1.5 + 0 + 0 + 0 = 1.5

After that the scoring profile functions will be evaluated. In this case there are 2. The first one evaluates the freshness with a range of 365 days, one year. The last updated field has a value of the 1st of April this year. Lets say thats 50 days from now. The total range is 365 so you will get a score of 1 if the last updated date is today. And a 0 if its 365 days or more in the past. In our case its 1 - 50 / 365 = 0.8630... The boost of the function is 10 so the score for the first function is 8.630.

The second function is a magnitude function with a range from 1 to 5. The document got a 4 star rating so thats worth a score of 0.8, because a 1 star is 0 and 5 stars is 1. So a for 4 star is obviously 4 / 5 = 0.8. The boost of the magnitude function is 8 so we have to multiple the value with 8. 0.8 * 8 = 6.4.

The functionAggregation is 0, which means we have to sum the results of all functions. Giving us a total score of scoring profile functions of: 6.4 + 8.630 = 15.03. The rule is then to multiple the total scoring profile functions score with the total weighted score of the fields giving us a grand total of: 15.03 * 1.5 = 22.545.

Hope you enjoined this example.

Consonance answered 11/6, 2020 at 12:49 Comment(4)
The last line is confusing here "The rule is then to multiple the total scoring profile functions score with the total weighted score of the fields giving us a grand total of: 15.03 + 1.5 = 16.53". Is it multiply or addition with the weighted base score?Moonseed
Please take a look at this detailed blog post. The total of the weighted score is multiplied with the aggregated functions score. dibranmulder.github.io/2020/09/22/…Consonance
Shouldn't it be 15.03 * 1.5 = 22.545 in that case? i.e., multiplication of total scoring profile functions score (15.03) with weighted score of the fields (1.5). You are doing addition here 15.03 + 1.5 = 16.53, am I missing something?Moonseed
@PankajYadav Yes I made a boo boo, edited it.Consonance
N
6

Thanks for the providing the details. What were the base relevance scores of the two documents?

The boosting factor provided in the scoring profile is actually multiplied to the base relevance scores computed using term frequencies. For example, suppose that the base scores, given in @search.score in the response payload, of the two documents were 0.5 and 0.2 and the values in the weight column were 0.5465 and 0.5419 respectively. With the scoring profile configuration given above, with starting value of 0, ending value of 1, linear interpolation, and the boost factor of 1000. The final score you get for each document is computed as the following :

document 1 : base search_score(0.5) * boost_factor (1000) * (weight (0.5465) - min(0)) / max - min (1) = final_search_score(273.25)

document 2 : base_search_score(0.2) * boost_factor (1000) * (weight (0.5419) - min(0)) / max - min (1) = final_search_score(108.38)

Please let me know if the final scores you get do not agree with the function above. Thanks!

Nate

Nerty answered 4/1, 2017 at 0:47 Comment(4)
Thanks Nate, this perfectly answers my question. One clarification I need though is the use of weights for each field. If I have weights on my fields, the base score is calculated first with these weights, then using this base score, the formulation you explained is calculated, is that correct?Mckamey
If you are referring to field weights in scoring profile, yes the field weight is already factored in in the base score, then additional boosting is applied.Nerty
Thanks again Nate. Also one last thing. Does a document with the minimum value in the scoring function always get a final score of 0? In my case weight - min may be 0, however in that case I don't want this document to be deep down in the search results, I just don't want it to be boosted and keep its original base score. So rather than multiplying the interpolated boost factor with the base score, I want to add them. Is there a way to do this? Or any other solution to this problem?Mckamey
@NateKo - Coming from elasticsearch background, trying my hands on Azure search. I was playing with scoring profile function however not getting score as per equation mentioned above. In my case, I have 0.71231794 as base_search_score, boost_factor = 2, weight(field value) = 3, min = 0 and max = 5. So, by above function new score should be (0.71231794×2×(3−0))÷5 = 0.854781528. But the final score which I get as a response is 1.1397088. Am I missing something or equation has changed in latest version?Geezer

© 2022 - 2024 — McMap. All rights reserved.