Django Postgres Full Text TrigramSimilarity Multiple Fields
Asked Answered
C

1

7

I am trying to figure out how to use TrigramSimilarity with unaccent for multiple fields.

So far i have:

def filter_by_location(self, queryset, location):
    log.info('Filtering location: "{location}"'.format(**locals()))
    queryset = queryset.filter(
        Q(accommodation__district__name__unaccent__trigram_similar=location) |
        Q(accommodation__municipality__name__unaccent__trigram_similar=location) |
        Q(accommodation__settlement__name__unaccent__trigram_similar=location)
    )

But i read in the documentation that i could order by similarity (not sure if the code above does that automatically), so i tried doing:

    queryset = queryset.filter(
        Q(accommodation__district__name__unaccent__trigram_similar=location) |
        Q(accommodation__municipality__name__unaccent__trigram_similar=location) |
        Q(accommodation__settlement__name__unaccent__trigram_similar=location)
    ).annotate(
        similarity=TrigramSimilarity(
            'accommodation__district__name', location
        ) + TrigramSimilarity(
            'accommodation__municipality__name', location
        ) + TrigramSimilarity(
            'accommodation__settlement__name', location
        ),
    ).filter(similarity__gt=0.3).order_by('-similarity')

I soon realized that the + in TrigramSimilarity was not doing the OR, but i need to full text search across all those different fields.

What is the correct syntax for me to achieve that (use an OR instead of and AND) taking query performance into account??

Thanks

Clasp answered 16/5, 2017 at 17:13 Comment(0)
S
12

I think the code below could be of help, but I haven't ran it:

queryset = queryset.annotate(
    similarity = Greatest(
        TrigramSimilarity('accommodation__district__name', location),
        TrigramSimilarity('accommodation__municipality__name', location),
        TrigramSimilarity('accommodation__settlement__name', location))
).filter(
    Q(accommodation__district__name__unaccent__trigram_similar=location) |
    Q(accommodation__municipality__name__unaccent__trigram_similar=location) |
    Q(accommodation__settlement__name__unaccent__trigram_similar=location),
  similarity__gt=0.3).order_by('-similarity')

Greatest can be imported like below:

from django.db.models.functions import Greatest
Satellite answered 16/5, 2017 at 23:0 Comment(7)
does not work.. i get a 'TrigramSimilarity' object has no attribute 'split'Clasp
@Clasp I removed the F calls. Could you try again?Satellite
it seems to work, but i am still not very happy regarding performance.. do you have any ideas on how to optimize this query?? (i already have indexes in place)Clasp
i found this on the docs "unaccent lookups should perform fine in most use cases. However, queries using this filter will generally perform full table scans, which can be slow on large tables. In those cases, using dedicated full text indexing tools might be appropriate." . So maybe i should avoid using unaccent if performance becomes a real issue here.Clasp
Actually, I would recommend using Solr or Elasticsearch for decent performance in text search. There's also Django Haystack: haystacksearch.org (which supports various backends)Satellite
not an option for me at the moment... thanks anywaysClasp
Sorry to hear that. Another thing I can suggest you is to have a field in each of your models which contains the unaccented string. You could do that by overriding the save method, and you could use unidecode for removing the accents. You would then do the lookups in the "unaccented" fields. Hope it helps.Satellite

© 2022 - 2024 — McMap. All rights reserved.