I don't understand the results that's returning from elasticsearch/haystack
Asked Answered
S

1

13

The results that are being returned from haystack, using an elasticsearch backend seem erroneous to me. My search index is as follows:

from haystack import indexes
from .models import IosVideo

class VideoIndex(indexes.SearchIndex, indexes.Indexable):                   
    text = indexes.CharField(document=True, use_template=True)              
    title = indexes.CharField(model_attr='title')                           
    absolute_url = indexes.CharField(model_attr='get_absolute_url')         
#    content_auto = indexes.EdgeNgramField(model_attr='title')              
    description = indexes.CharField(model_attr='description')               
#    thumbnail = indexes.CharField(model_attr='thumbnail_url', null=True)   

    def get_model(self):                                                    
        return IosVideo                                                     

    def index_queryset(self, using=None):                                   
        return self.get_model().objects.filter(private=False)  

My text document looks like:

{{ object.title }}
{{ object.text }}
{{ object.description }}

My query is

SearchQuerySet().models(IosVideo).filter(content="darby")[0]

The result that's returning that makes me think this is not working is a video object with the following characteristics

title: u'Cindy Daniels'
description: u'',
text: u'Cindy Daniels\n\n\n',
absolute_url: u'/videos/testimonial/cindy-daniels/'

Why in the world would the query return such a result? I'm very confused.

My current theory is that it's tokenizing every subset of the char in the query and using that as partial match. Is there a way to decrease this tolerance to be a closer match.

My pip info is elasticsearch==1.2.0 django-haystack==2.3.1

And the elasticsearch version number is 1.3.1

Additionally when I hit the local server with http://localhost:9200/haystack/_search/?q=darby&pretty

It returns 10 results.

SearchQuerySet().filter(content="darby")  

Returns 4k results.

Does any one know what would cause this type of behavior?

Sedation answered 5/3, 2015 at 21:37 Comment(5)
Are you, by any chance, using elasticstack, or a custom analyzer? That could possibly explain the results that you're seeing. I've sure you saw, but the default lookup in filter as of Haystack 2.X is contains, rather than exact. That, plus an analyzer which looks at partial words, could potentially match that document.Putrid
No custom analyzer :( my pip looks like this elasticsearch==1.2, django-haystack==2.3.1. The elasticsearch version is 1.3.1Sedation
Have you tried directly querying elasticsearch to compare the results? For example http://localhost:9200/_search/?q=darby where search is your index name.Shanley
Did you inspect what the indexed documents in elasticsearch contain, in this case e.g. the document for Cindy Daniels?Commonplace
@LucasMoeskops localhost:9200/haystack/_search/?q=darby returns 10 results and none of those results are the Cindy Daniels object. So something is very amiss with Haystack then, correct?Sedation
N
4

There is a problem with the filter() method on Charfield indexes for django-haystack 2.1.0. You can change them to NgramField instead, for example text = indexes.NgramField(document=True, template_name=True).

The problem is that when you use this combination you get just the first character. So it returns you all the matches that has a 'd' in their text index field.

Nobles answered 29/4, 2015 at 14:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.