Solr normalization score

Asked 22/2, 2016 at 9:30 Answered 29/1, 2023 at 5:53

I wanted to know if there is a way to know if the first result in the solr response is an exact match of my query? for example i'm searching for documents with the words: "iphone 6s 64GB gold"

I got 3 results:

1) The first result with the words "iphone 6s 64GB" with score: 187.86491

2) The second result with the words "iphone 6s" with score: 170.36568

3) The third result with the word "iphone" with score: 136.68152

When i normalize the scores i got these new scores:

1) score 1.0
2) score 0.92
3) score 0.66

the problem here is that the first result got score 1.0 (only because it the first result with the higher solr score , but it can't testify that it's an exact match ) while ,in my opinion ,it's should be ~0.5 because it's not an exact match. I want to know if the results that i got are really relevant or not and to take only the "most relevant" results - for example: only the results with score > 0.6. But i can't do it now because 0.6 doesn't testify the real relevancy.

Cogswell answered 22/2, 2016 at 9:30 Comment(0)

There is no such thing as a "real relevancy", which is why the top score isn't normalized for 1.0. Things can be considered more or less relevant based on the parameters you give Solr (such as how to score individual fields against each other). What would "60% relevancy" really mean in this context? Scores between queries are (usually) not comparable, and will change depending on the contents of the index (if a new document with the same terms are indexed, the scores for the previous query might be reduced if run again).

If you want to prioritize exact matches, add a field with a KeywordTokenizer and a LowercaseField, and score that field higher (through qf=). If case matters, use a StrField instead (which will give you only perfectly exact matches) and score that field higher.

If you want to require all terms to be present, use q.op=AND, which won't give any hits if all fields aren't present. If you want to do more advanced matching, use the mm parameter to say exactly how many terms need to match (which you can do as a percentage, within an interval, etc.).

These settings are relevant when you're using the dismax or edismax query handler, which it sounds like you're doing from your question.

Cocklebur answered 22/2, 2016 at 14:55 Comment(2)

you ask "What would "60% relevancy", if I am searching for "word1 word2 word3" and the result that I get contains only "word2 word3" I would like to have an indication in score, that it's not a perfect match it's x% match. How can you do it using Solr ? – Python 22/2, 2016 at 16:32

@JackJuiceson You'll have to create your own Similarity class for that, see Ignore tf/idf at query time in Solr – Cocklebur 22/2, 2016 at 18:38

in order to do what you ask (not considering why you want to do that), you could:

use highlighting to return what is matched in the docs
compare the query string to the highlighted fragments and verify whether it is a perfect match

Caveats:

if you are using stemmers etc, an exact match could mean just matching part of a term. So you cannot just use a string comparison, you need to run both the query string and the fragment through each analysis chain first (query string through query analysis, fragments through index analysis)
depending on the highlighting type, you might need certain features on your fields.

Exmoor answered 11/2, 2017 at 22:50 Comment(0)

Other answers make good points (I upvoted), but I'd like to add some further color.

One should not attempt to infer details about the document from the value of the score (at least not with any standard TF/IDF or BM25 based similarity classes). The only thing these scores tell you is which document is likely to be relevant assuming that the assumptions of the scoring model are correct.

These models assume generally that "rare" words are more important than common words (often "gold" is more important than "made" or "of" since many things are made and the word "of" is likely to be in almost every document, but fewer things are gold...), and documents for which a higher proportion of words match the query are more important than documents with fewer matches. (i.e. 12 matches in a 150 word document probably is more relevant than 14 matches in a 50000 word document)

"rare" is estimated by looking at the documents in the index (the system can't know about anything it hasn't indexed). Therefore the score for a document changes every time any document is added to the index. Either

The new document contains one of terms in the query you care about, or
The new document does not contain one of the terms in the query you care about.

In the first case, the fraction of documents goes up (+1 to both numerator and denominator, so if 1 out of 2 did before, now 2 out of 3 do now). In the second case the number of documents goes up and the fraction goes down (1 out of 2 becomes 1 out of 3). Thus in case #1 the score of every previously matching document goes down and in case #2 the score of every previously matching document goes up (the score is proportional to the inverse of document frequency: i.e. 1/IDF, BM25 is trickier but similar)

Mostly it seems people ask this type of question after they made the tactical error of printing the document score in the results that the user sees. The user, not being an information retrieval expert, has no idea what the number means. The user usually complains because they've made a guess about how it works and then found that their guess was wrong. Don't show the score to users, even if you 'normalized' it. The score will only confuse them.

If you really need to ensure that you only get results where all terms match, then you can set q.op=AND, but this runs a strong risk of users getting completely empty search results. Users are rarely happy with a blank search results page (there are some cases, but it's rare), and users are probably not going to buy anything if they get no results, whereas they might buy the next best thing if you show it to them.

You may still get things that seem like false matches if you are stemming, using synonyms, or in other cases where the token is modified during analysis. "golden" and "gold" would likely both get stemmed to "gold" and so with stemming your query of "iphone 6s 64GB gold" would also match a document with the text "golden opportunity to win a free case for galaxy note 9".

Scores are for sorting by relevancy. They are not good for anything else.

Finally there IS a way to get at the information about which terms matched from the debug output, but forcing solr to return that output is expensive and may lead to unacceptable query response times and large increases in the size of the data transferred for query responses. This is the option of last resort because it is so costly. Very few use cases derive enough value from parsing this output to pay for the cost of producing it. Also, that output is for debugging and is somewhat more likely to change between solr versions than the rest of the response (to reflect new features if nothing else) and that could make upgrades painful.

Lanielanier answered 29/1, 2023 at 5:53 Comment(0)

Recommended topics

Hot tags