Django Haystack - Filter by substring of a field using SearchQuerySet ()
Asked Answered
O

2

7

I have a Django project that uses SOLR for indexing.

I'm trying to do a substring search using Haystack's SearchQuerySet class.

For example, when a user searches for the term "ear", it should return the entry that has a field with the value: "Search". As you can see, "ear" is a SUBSTRING of "Search". (obviously :))

In other words, in a perfect Django world I would like something like:

SearchQuerySet().all().filter(some_field__contains_substring='ear')

In the haystack documentation for SearchQuerySet (https://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#field-lookups), it says that only the following FIELD LOOKUP types are supported:

  • contains
  • exact
  • gt, gte, lt, lte
  • in
  • startswith
  • range

I tried using __contains, but it behaves exactly like __exact, which looks up the exact word (the whole word) in a sentence, not a substring of a word.

I am confused, because such a functionality is pretty basic, and I'm not sure if I'm missing something, or there is another way to approach this problem (using Regex or something?).

Thanks

Oceania answered 18/12, 2013 at 13:9 Comment(0)
J
6

That could be done using EdgeNgramField field:

some_field = indexes.EdgeNgramField() # also prepare value for this field or use model_attr

Then for partial match:

SearchQuerySet().all().filter(some_field='ear')
Jepson answered 18/12, 2013 at 19:2 Comment(3)
Thank you! Your answer is not 100% correct, but it lead me in the right direction. The solution was to use the NgramField, not the EdgeNgramField, like this: some_field = indexes.NgramField(model_attr='some_field'). The EdgeNgramField can only do "starts with" and "ends with" type of filtering.Oceania
I wasn't workin with Solr but use NgramField worked for me with ElastiSearch.Rogation
EdgeNgram Field is not working like __contains, it works by stemming and find other matches based on the stems, hence it'll yield to a much more fuzzy result set than contains.Jojo
J
2

It's a bug in haystack.

As you said, __exact is implemented exactly like __contains and therefore this functionality does not exists out of the box in haystack.

The fix is awaiting merge here: https://github.com/django-haystack/django-haystack/issues/1041

You can bridge the waiting time for a fixed release like this:

from haystack.inputs import BaseInput, Clean


class CustomContain(BaseInput):
    """
    An input type for making wildcard matches.
    """
    input_type_name = 'custom_contain'

    def prepare(self, query_obj):
        query_string = super(CustomContain, self).prepare(query_obj)
        query_string = query_obj.clean(query_string)

        exact_bits = [Clean(bit).prepare(query_obj) for bit in query_string.split(' ') if bit]
        query_string = u' '.join(exact_bits)

        return u'*{}*'.format(query_string)

# Usage:
SearchQuerySet().filter(content=CustomContain('searchcontentgoeshere'))
Jojo answered 21/10, 2015 at 13:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.