Django-Haystack with Solr contains search
Asked Answered
H

4

8

I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")

The __startswith option does not suit our needs as it, as the name suggests, looks for words that start with the string.

I tried to use something like *keyword* but Solr does not allow the * to be used as the first character

Thanks.

Hooded answered 14/6, 2011 at 0:14 Comment(2)
is 'keyword' a whole word or are you trying to search for partial words?Dispirited
solution pasted here: https://mcmap.net/q/1326997/-django-haystack-filter-by-substring-of-a-field-using-searchquerysetMilli
M
10

To get "contains" functionallity you can use:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />

as index analyzer.

This will create ngrams for every whitespace separated word in your field. For example:

"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!

As you see this will expand your index greatly but if you now enter a query like:

"nde*"

it will match "ndex" giving you a hit.

Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.

Miraculous answered 14/6, 2011 at 7:31 Comment(0)
M
2

You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.

Menswear answered 18/4, 2013 at 12:28 Comment(0)
S
0

I am using an expression like: .filter(something__startswith='...') .filter_or(name=''+s'...') as is seems solr does not like expression like '...*', but combined with or will do

Sailesh answered 25/1, 2013 at 11:23 Comment(0)
S
0

None of the answers here do a real substring search *keyword*.

They don't find the keyword that is part of a bigger string, (not a prefix or suffix).

Using EdgeNGramFilterFactory or the EdgeNgramField in the indexes can only do a "startswith" or a "endswith" type of filtering.

The solution is to use a NgramField like this:

class MyIndex(indexes.SearchIndex, indexes.Indexable):
    ...
    field_to_index= indexes.NgramField(model_attr='field_name')
    ...

This is very elegant, because you don't need to manually add anything to the schema.xml

Sleek answered 19/12, 2013 at 18:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.