Use of full-text search + GIN in a view (Django 1.11 )
Asked Answered
R

1

8

I need some help with building proper query in a django view for full-text search using GIN index. I have quite a big database (~400k lines) and need to do a full-text search on 3 fields from it. Tried to use django docs search and this is code BEFORE GIN. It works, but takes 6+ seconds to search over all fields. Next I tried to implement a GIN index to speed up my search. There are a lot of questions already how to build it. But my question is - how does the view query change when using a GIN index for search? What fields should I search?

Before GIN:

models.py

class Product(TimeStampedModel):
    product_id = models.AutoField(primary_key=True)
    shop = models.ForeignKey("Shop", to_field="shop_name")
    brand = models.ForeignKey("Brand", to_field="brand_name")
    title = models.TextField(blank=False, null=False)
    description = models.TextField(blank=True, null=True)

views.py

   
def get_cosmetic(request):
    if request.method == "GET":
        pass
    else:
        search_words = request.POST.get("search")
        search_vectors = (
            SearchVector("title", weight="B")
            + SearchVector("description", weight="C")
            + SearchVector("brand__brand_name", weight="A")
        )

        products = (
            Product.objects.annotate(
                search=search_vectors, rank=SearchRank(search_vectors, search)
            )
            .filter(search=search_words)
            .order_by("-rank")
        )

        return render(request, "example.html", {"products": products})

After GIN:
models.py

class ProductManager(models.Manager):
    def with_documents(self):
        vector = (
            pg_search.SearchVector("brand__brand_name", weight="A")
            + pg_search.SearchVector("title", weight="A")
            + pg_search.SearchVector("description", weight="C")
        )
        return self.get_queryset().annotate(document=vector)


class Product(TimeStampedModel):
    product_id = models.AutoField(primary_key=True)
    shop = models.ForeignKey("Shop", to_field="shop_name")
    brand = models.ForeignKey("Brand", to_field="brand_name")
    title = models.TextField(blank=False, null=False)
    description = models.TextField(blank=True, null=True)

    search_vector = pg_search.SearchVectorField(null=True)

    objects = ProductManager()

    class Meta:
        indexes = [
            indexes.GinIndex(
                fields=["search_vector"],
                name="title_index",
            ),
        ]

    # update search_vector every time the entry updates
    def save(self, *args, **kwargs):
        super().save(*args, **kwargs)
        if (
            "update_fields" not in kwargs
            or "search_vector" not in kwargs["update_fields"]
        ):
            instance = (
                self._meta.default_manager
                .with_documents().get(pk=self.pk)
            )
            instance.search_vector = instance.document
            instance.save(update_fields=["search_vector"])

views.py

def get_cosmetic(request):
    if request.method == "GET":
        pass

    else:
        search_words = request.POST.get('search')    
        products = ?????????
        return render(request, 'example.html', {"products": products})
Rule answered 19/11, 2017 at 14:27 Comment(4)
Why can't we assign search_vector directly to instance.search_vector? I cannot understand, pls explain for me. thanksMontherlant
@Montherlant as it is been said in docs (docs.djangoproject.com/en/2.0/ref/models/instances/…) - .save(update_fields=['search_vector']) can update only selected fields and increase the performance. If I correctly understand a question.Rule
I was asking why we do that instance = self._meta.default_manager.with_documents().get(pk=self.pk) instance.search_vector = instance.document Why can't we assign like that: instance.search_vector = SearchVector(...)Montherlant
@Montherlant I think you are right, it can also work (but I am not sure) - the part of this recipe is taken from the link from the answer. May be that simplifies queryset calls by just calling manager method. If you test that - share the results pls, I am curious.Rule
R
10

Answering my own question:

products = (
    Product.objects.annotate(rank=SearchRank(F("search_vector"), search_words))
    .filter(search_vector=search_words)
    .order_by("-rank")
)


This means you should search your index field - in my case search_vector field.
Also I have changed my code a bit in ProductManager() class, so now I can just use

products = Product.objects.with_documents(search_words)

Where with_documents() is a custom function of custom ProductManager(). The recipe of this change is here (page 29).

What does all this code do:

  1. creates search_vector with scores to fields, field with bigger score - gets higher place in result sorting.
  2. creates GIN index for full-text search via ORM Django
  3. updates GIN index every time the instance of model is changed

    What this code dosn't do:
  4. It doesn't sort by relevance of substring which is queried. Possible solution.

    Hope this will help somebody with a bit complicated full-text search in Django.
Rule answered 7/12, 2017 at 16:37 Comment(3)
here (page 30) is not working.Castorina
@AnshulTiwari ahh, I checked it a couple of months ago and it was still working :(Rule
@AnshulTiwari have a look at this presentation - ep2017.europython.eu/media/conference/slides/… it looks similar to what I based previously (page 29)Rule

© 2022 - 2024 — McMap. All rights reserved.