Django, Haystack, Solr and Boosting
Asked Answered
I

2

9

TLDR;

How does various boosting types work together in django, django-haystack and solr?

I am having trouble getting the most obvious search results to appear first. If I search for caring for others and get 10 results, The object with title caring for others appears second in the results after caring for yourself.

Document Boosting

I have document boosted Category objects a factor of factor = 2.0 - ((the mptt tree level)/10) so 1.9 for root nodes, 1.8 for second level, 1.7 for third level so on and so forth. (or 190%, 180%, 170%... so on and so forth)

Field Boosting

title is boosted by boost=1.5 positive factor of 150% content is boosted by boost=.5 negative factor 50%

Term Boosting

I am currently not boosting any search terms.

My Goal

I want to get a list of results Categories and Articles (I'm ignoring Articles until I get my Category results straight). With Categories weighted higher than Articles, and titles weighted higher than content. Also, I'm trying to weight root category nodes higher than child nodes.

I feel like I'm missing a key concept somewhere.

Information

I'm using haystack's built-in search form and search view.

I'm using the following package/lib versions:

Django==1.4.1
django-haystack==1.2.7
pysolr==2.1.0-beta

My Index Class

class CategoryIndex(SearchIndex):
    """Categorization -> Category"""
    text = CharField(document=True, use_template=True, boost=.5)
    title = CharField(model_attr='title', boost=1.5)
    content = CharField(model_attr='content', boost=.5)
    autocomplete = EdgeNgramField(model_attr='title')

    def prepare_title(self, object): 
        return object.title

    def prepare(self, obj):
        data = super(CategoryIndex, self).prepare(obj)
        base_boost = 2.0
        base_boost -= (float(int(obj.level))/10)
        data['boost'] = base_boost
        return data

my search template at templates/search/categorization/category_text.txt

{{ object.title }}
{{ object.content }}

UPDATE

I noticed that when I took {{ object.content }} out of my search template, that records started appearing in the expected order. Why is this?

Isolt answered 4/9, 2012 at 20:39 Comment(0)
T
2

The Dismax Parser (additionally ExtendedDismax from SOLR 3.1 on) has been created exactly for these needs. You can configure all the fields that you want to have searched ('qf' parameter), add custom boosting to each and specify those fields where phrase hits are especially valuable (adding to the hit's score; the 'pf' parameter). You can also specify how many tokens in a search have to match (by a flexible rule pattern; the 'mm' parameter).

e.g. the config could look like this (part of a request handler config entry in solrconfig.xml - I'm not familiar how to do that with haystack, this is plain SOLR):

<str name="defType">dismax</str>
<str name="q.alt">*:*</str>
<str name="qf">text^0.5 title^1.5 content^0.5</str>
<str name="pf">text title^2 content</str>
<str name="fl">*,score</str>
<str name="mm">100%</str>
<int name="ps">100</int>

I don't know about haystack but it seems it would provide Dismax functionality: https://github.com/toastdriven/django-haystack/pull/314

See this documentation for the Dismax (it links to ExtendedDismax, as well): http://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax

Tzar answered 20/9, 2012 at 8:46 Comment(1)
thanks, this pointed out a couple things about boosting i hadn't heard of yet.Isolt
T
0

It seems that you are just trying to be too smart here with all those boosts.

E.g. those in fields are completely needles if you are using default search view. In fact auto_query which is runned by default uses only one field to search - only this one marked as document=true. And haystack actually names this field content internally, so I would sugegst to rename it in search index to avoid any possible conflicts.

If it doesn't help (probably will not) you must create your custom search form or use simple workaround to achieve something you want, by placing field you want to boost multiple times in template:

{{ object.title }}
{{ object.title }}
{{ object.content }}
Trogon answered 12/9, 2012 at 20:33 Comment(3)
putting title in the fulltext record twice seems like a pretty bad idea. once you do that, you have to reindex the entire set if you change the scoring scheme. besides, boost is a built in part of solr. Its purpose is to accomplish a dynamic scoring scheme. I want to know how to use the tool, not work around it. That said thanks for weighing in.Isolt
So as I wrote you must create custom SearchForm when you will search on all fields instead of default one.Trogon
I'm going to award the bounty to you so that the rep doesn't disappear into the ether. I don't plan on accepting the answer though. I'm still waiting for an answer that explains the proper usage of boost and how boost calculates to a solr score.Isolt

© 2022 - 2024 — McMap. All rights reserved.