Django text search with partial sentence match
Asked Answered
O

6

8

I am building a site in which I want to implement text search for the title and description of some objects. Since I will have little amount of objects (~500 documents) I am not considering Haystack and the such.

I only need 2 features:

  • Be able to prioritize matches on the title over the description (with some kind of weight).
  • Allow partial match of the sentence. For example, if I search for 'ice cream', get also the results for 'ice' and 'cream'.

I have looked into django-watson and django-full-text-search but I am not sure if they allow partial matching. Any ideas?

Occident answered 21/7, 2012 at 16:42 Comment(1)
What is the underlying Database?Deandre
N
3

How many hits by second have your site? Each document, how many data stores?

If we are talking about 500 docs and few hits by minute perhaps django api is enough:

q = None
for word in search_string.split():
   q_aux = Q( title__icontains = word ) | Q( description__icontains = word )
   q = ( q_aux & q ) if bool( q ) else q_aux

result = Document.objects.filter( q ) 

You ever considered this option?

Be careful:

  • This approach don't priorize title over description
  • Only "all words" matches appear in results.
Newcastle answered 21/7, 2012 at 19:32 Comment(2)
Thanks for the answer. I will use this simple approach for now and maybe after I can move to a better solution as the ones pointed below.Occident
Ok, let us news about code performance when your site will be in production environment.Newcastle
C
7

Using the new full-text search in django.contrib.postgres as a starting point, one can expand upon SearchQuery to create a version that handles searches for a partial part of the final word:

from psycopg2.extensions import adapt
from django.contrib.postgres.search import SearchQuery


class PrefixedPhraseQuery(SearchQuery):
    """
    Alter the tsquery executed by SearchQuery
    """

    def as_sql(self, compiler, connection):
        # Or <-> available in Postgres 9.6
        value = adapt('%s:*' % ' & '.join(self.value.split()))

        if self.config:
            config_sql, config_params = compiler.compile(self.config)
            template = 'to_tsquery({}::regconfig, {})'\
                .format(config_sql, value)
            params = config_params

        else:
            template = 'to_tsquery({})'\
                .format(value)
            params = []

        if self.invert:
            template = '!!({})'.format(template)
    
        return template, params

Refer to the Postgres docs for the ts_query syntax.

You can then use it in a query like so:

vector = SearchVector(  
    'first_name',
    'last_name',
    'email',
    config='simple')
query = PrefixedPhraseQuery(query, config='simple')
queryset = queryset\
    .annotate(vector=vector)\
    .filter(vector=query)

You could also write a startswith lookup, refer to the implementation of SearchVectorExact.

Django 3+ Answer

This has become much simpler in more recent versions of Django. SearchQuery now has a raw mode that can be used to request a prefix query.

query = SearchQuery("search & term & prefix:*", search_type="raw")
results = Model.objects\
    .filter(_search_vector=query)\
    .annotate(
        rank=SearchRank(
            F("_search_vector"),
            query,
            cover_density=True,
        )
    )
    .order_by("-rank")

Where _search_vector is a SearchVectorField, or can be annotated onto the model.

Cruz answered 19/4, 2017 at 3:51 Comment(4)
Postgres only supports prefix search. You cannot just extend it to use for partial search.Acquainted
Starting with Postgres 9.6 you can use the proximity operator <->Thighbone
It works great, but unfortunately does not work in python 3.9. Do you have any idea how to update this? I have posted the question #65952000Husbandry
Looks like SearchVectorExact no longer exists in Django 3.x+, there's just SearchVector: docs.djangoproject.com/en/3.2/ref/contrib/postgres/search/…Whodunit
N
3

How many hits by second have your site? Each document, how many data stores?

If we are talking about 500 docs and few hits by minute perhaps django api is enough:

q = None
for word in search_string.split():
   q_aux = Q( title__icontains = word ) | Q( description__icontains = word )
   q = ( q_aux & q ) if bool( q ) else q_aux

result = Document.objects.filter( q ) 

You ever considered this option?

Be careful:

  • This approach don't priorize title over description
  • Only "all words" matches appear in results.
Newcastle answered 21/7, 2012 at 19:32 Comment(2)
Thanks for the answer. I will use this simple approach for now and maybe after I can move to a better solution as the ones pointed below.Occident
Ok, let us news about code performance when your site will be in production environment.Newcastle
I
3

As the creator of django-watson, I can confirm that, with some database backends, it allows partial matches. Specifically, on MySQL and PostgreSQL, it allows prefix matching, which is a partial match from the beginning of a word.

Check out this database comparison page on the wiki:

https://github.com/etianen/django-watson/wiki/Database-support

Ileneileo answered 5/2, 2013 at 10:5 Comment(0)
P
2

Check out this article. It has information about what you are trying to do.

Take a look at Haystack as well. Whoosh seems to be a good option.

Partridge answered 21/7, 2012 at 18:21 Comment(1)
+1, haystack is great. Although I would consider using solr instead of woosh: much more work to setup, but much more horse power.Nun
W
1

Full text search it is now supported by Django: Django Full Text Search.

IMPORTANT: It seems this is only enabled for postgres db backend.

# Example based on Django Docs.
Entry.objects.annotate(
   search=SearchVector('title', 'description'),
).filter(search='some_text')

You could also use the search lookup

Entry.objects.filter(title__search='Cheese')
Waterford answered 28/5, 2018 at 17:44 Comment(0)
S
0

I have used Apache Solr in my projects and it is very good and has a good deal of docs. And do check sunburnt, pysolr and solrpy

Stull answered 22/7, 2012 at 14:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.