So the provided answer by Nate is difficult to understand and it misses some components. I
have made an overview of the entire scoring process, and its quite complex.
So when an user executes a search a query
is given to Azure Search. Azure search uses the TF-IDF
algorithm to determine a score from 0-1 based on Tokens
being formed by the Analyzer. Keep in mind that language specific analyzers can come up with multiple tokens for one word. For every searchable field the score will be produced and then multiplied
by the weight in the scoring profile. Lastly all weighted scores will be summed up and that's the initial weighted score.
A scoring profile might also contain scoring functions. The scoring function can be either a magnitude, freshness, geo or tag based function. Multiple functions can be made within one scoring profile.
The functions will be evaluated and the score from the functions can be either summed up, or taken the average, minimum, maximum or first matching. The total of all functions is then multiplied by the total weighted score and that's the final score.
An example, this is an example index with scoring profile.
{
"name": "musicstoreindex",
"fields": [
{ "name": "key", "type": "Edm.String", "key": true },
{ "name": "albumTitle", "type": "Edm.String" },
{ "name": "genre", "type": "Edm.String" },
{ "name": "genreDescription", "type": "Edm.String", "filterable": false },
{ "name": "artistName", "type": "Edm.String" },
{ "name": "rating", "type": "Edm.Int32" },
{ "name": "price", "type": "Edm.Double", "filterable": false },
{ "name": "lastUpdated", "type": "Edm.DateTimeOffset" }
],
"scoringProfiles": [
{
"name": "boostGenre",
"text": {
"weights": {
"albumTitle": 1.5,
"genre": 5,
"artistName": 2
}
}
},
{
"name": "newAndHighlyRated",
"functions": [
{
"type": "freshness",
"fieldName": "lastUpdated",
"boost": 10,
"interpolation": "linear",
"freshness": {
"boostingDuration": "P365D"
}
},
{
"type": "magnitude",
"fieldName": "rating",
"boost": 8,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 1,
"boostingRangeEnd": 5,
"constantBoostBeyondRange": false
}
}
],
"functionAggregation": 0
}
]
}
Lets say the entered query is meteora
the famous album by Linkin Park. Lets say we have the following document in our index.
{
"key": 123,
"albumTitle": "Meteora",
"genre": "Rock",
"genreDescription": "Rock with a flick of hiphop",
"artistName": "Linkin Park",
"rating": 4,
"price": 30,
"lastUpdated": "2020-01-01"
}
I'm not an expert on TF-IDF but I can imagine that the following unweighted score will be produced:
{
"albumTitle": 1,
"genre": 0,
"genreDescription": 0,
"artistName": 0
}
The scoring profile has a weight of 1.5 on the albumTitle field, so the total weighted score will be: 1 * 1.5 + 0 + 0 + 0 = 1.5
After that the scoring profile functions will be evaluated. In this case there are 2. The first one evaluates the freshness with a range of 365 days, one year. The last updated field has a value of the 1st of April this year. Lets say thats 50 days from now. The total range is 365 so you will get a score of 1 if the last updated date is today. And a 0 if its 365 days or more in the past. In our case its 1 - 50 / 365 = 0.8630..
. The boost of the function is 10
so the score for the first function is 8.630
.
The second function is a magnitude function with a range from 1 to 5. The document got a 4 star rating so thats worth a score of 0.8, because a 1 star is 0 and 5 stars is 1. So a for 4 star is obviously 4 / 5 = 0.8
. The boost of the magnitude function is 8 so we have to multiple the value with 8. 0.8 * 8 = 6.4
.
The functionAggregation
is 0, which means we have to sum the results of all functions. Giving us a total score of scoring profile functions of: 6.4 + 8.630 = 15.03
. The rule is then to multiple the total scoring profile functions score with the total weighted score of the fields giving us a grand total of: 15.03 * 1.5 = 22.545
.
Hope you enjoined this example.