Django haystack whoosh super slow
Asked Answered
S

2

13

I have a simple setup with django-haystack and whoosh engine. A search yielding 19 objects took me 8 seconds. I used the django-debug-toolbar to determine that i had a bunch of repeated queries.

I then updated my search view to prefetch relations, so that duplicate queries would not happen:

class MySearchView(SearchView):
    template_name = 'search_results.html'
    form_class = SearchForm
    queryset = RelatedSearchQuerySet().load_all().load_all_queryset(
        models.Customer, models.Customer.objects.all().select_related('customer_number').prefetch_related(
            'keywords'
        )
    ).load_all_queryset(
        models.Contact, models.Contact.objects.all().select_related('customer')
    ).load_all_queryset(
        models.Account, models.Account.objects.all().select_related(
            'customer', 'account_number', 'main_contact', 'main_contact__customer'
        )
    ).load_all_queryset(
        models.Invoice, models.Invoice.objects.all().select_related(
            'customer', 'end_customer', 'customer__original', 'end_customer__original', 'quote_number', 'invoice_number'
        )
    ).load_all_queryset(
        models.File, models.File.objects.all().select_related('file_number', 'customer').prefetch_related(
            'keywords'
        )
    ).load_all_queryset(
        models.Import, models.Import.objects.all().select_related('import_number', 'customer').prefetch_related(
            'keywords'
        )
    ).load_all_queryset(
        models.Event, models.Event.objects.all().prefetch_related('customers', 'contracts', 'accounts', 'keywords')
    )

But even then, the search still takes 5 seconds. I then used the profiler from django-debug-toolbar, which gave me this information:

Django debug toolbar profiler results

From what I can tell, the issue lies in haystack/query:779::__getitem__, which is hit twice, each costing 1.5 seconds. I have glanced through the code in question, but cannot make sense of it. So where do I go from here?

Semiliquid answered 24/8, 2015 at 13:16 Comment(6)
How many objects are you in your search index?Capitalist
@Capitalist "19 objects"Meerschaum
It seems like __getitem__ triggers 2 queries to the database. Take a look on _fill_cache, it calls get_results twice which takes about 3 seconds overall. Are you sure that all items have been prefetched from the database?Specialty
Previously I also used to build search using whoosh and haystack due to performance issue we moved to elasticsearch (elastic.co/products/elasticsearch)Madra
I agree with @Taras, Seems like the query is not prefetched.Scoville
Just want to add to @booksapp's comment: If you're working on a production project and considering python+elasticsearch, I'd highly caution against using Haystack in between. Haystack provides a nice abstraction layer, and gets you going quickly, but it makes advanced queries and indexing much more difficult. You'll regret it down the road. Instead look at a first class library like github.com/elastic/elasticsearch-dsl-pyHilaire
A
1

You say in the question:

I then updated my search view to prefetch relations […]

The code you present, though, does not use QuerySet.prefetch_related for most of them. Instead, your sample code uses QuerySet.select_related for most of them; this does not pre-fetch the objects.

The documentation for each of those methods is extensive and can help to decide which is correct for your case.

In particular, the QuerySet.prefetch_related documentation says:

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.

Apologue answered 14/11, 2017 at 2:54 Comment(0)
T
0

Try adding

HAYSTACK_LIMIT_TO_REGISTERED_MODELS = False

to your settings.py. As per the docks,

'If your search index is never used for anything other than the models registered with Haystack, you can turn this off and get a small to moderate performance boost.'

It knocked 3-4 seconds off for my project

Tace answered 29/8, 2018 at 16:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.