how can I limit by score before sorting in a solr query
Asked Answered
C

3

6

I am searching "product documents". In other words, my solr documents are product records. I want to get say the top 50 matching products for a query. Then I want to be able to sort the top 50 scoring documents by name or price. I'm not seeing much on how to do this, since sorting by score, then by name or price won't really help, since scores are floats.

I wouldn't mind if I could do something like map the scores to ranges (like a score of 8.0-8.99 would go in the 8 bucket score), then sort by range, then by names, but since there is basically no normalization to scoring, this would still make things a bit harder.

Tl;dr How do I exclude low scoring documents from the solr result set before sorting?

Compound answered 7/12, 2010 at 22:21 Comment(0)
I
4

You can use frange to achieve this, as long as you don't want to sort on score (in which case I guess you could just do the filtering on the client side).

Your query would be something along the lines of:

q={!frange l=5}query($qq)&qq=[awesome product]&sort=price asc

Set the l argument in the q-frange-parameter to the lower bound you want to filter score on, and replace the qq parameter with your user query.

Idyllist answered 8/12, 2010 at 10:23 Comment(1)
thanks, since I can get a reasonable frange from the first time the results are displayed sorted by score alone, this works great!Compound
T
1

As observed by Karl Johansson, you could do the filtering on the client side: load the first 50 rows of the response (sorted by score desc) and then manipulate them in JS for example.

The jQuery DataTables plugin works fantastically for that kind of thing: sorting, sorting on multiple columns, dynamic filtering, etc. -- and with only 50 rows it would be very fast too, so that users can "play" with the sorting and filtering until they find what they want.

Tellurium answered 10/12, 2010 at 9:48 Comment(0)
T
0

I don't think you can simply

exclude low scoring documents from the solr result set before sorting

because the relevance score is only meaningful for a given combination of search query and resulting document list. I.e. scores are only meaningful within a given search and you cannot set some threshold for all searches.

If you were using Java (or PHP) you could get the top 50 documents and then re-sort this list in your programming language but I don't think you can do it with just SOLR.

Anyway, I would recommend you don't go down this route of re-sorting the results from SOLR, as it will simply confuse the user. People expect search results to be like Google (and most other search engines), where results come back in some form of TFIDF ranking.

Having said that, you could use some other criteria to separate documents with the same relevance scores by adding an index-time boost factor based on a price range scale.

I'd suggest you use SOLR to its strengths and use facets. Provide a price range facet on the left (like Ebay, Amazon, et al.) and/or a product category facet, etc. Also provide a "sort" widget to allow the results to be sorted by product name, if the user wants it.

[EDIT] this question might also be useful:

Digg-like search result ranking with Lucene / Solr?

Thalia answered 8/12, 2010 at 6:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.