Creating a Gin Index with Trigram (gin_trgm_ops) in Django model
Asked Answered
T

6

20

The new TrigramSimilarity feature of the django.contrib.postgres was great for a problem I had. I use it for a search bar to find hard to spell latin names. The problem is that there are over 2 million names, and the search takes longer then I want.

I'd like to create a index on the trigrams as descibed in the postgres documentation.

But I am not sure how to do this in a way that the Django API would make use of it. For the postgres text search there is a description on how to create an index, but not for the trigram similarity.

This is what I have right now:

class NCBI_names(models.Model):
    tax_id          =   models.ForeignKey(NCBI_nodes, on_delete=models.CASCADE, default = 0)
    name_txt        =   models.CharField(max_length=255, default = '')
    name_class      =   models.CharField(max_length=32, db_index=True, default = '')

    class Meta:
        indexes = [GinIndex(fields=['name_txt'])]

In the view's get_queryset method:

class TaxonSearchListView(ListView):    
    #form_class=TaxonSearchForm
    template_name='collectie/taxon_list.html'
    paginate_by=20
    model=NCBI_names
    context_object_name = 'taxon_list'

    def dispatch(self, request, *args, **kwargs):
        query = request.GET.get('q')
        if query:
            try:
                tax_id = self.model.objects.get(name_txt__iexact=query).tax_id.tax_id
                return redirect('collectie:taxon_detail', tax_id)
            except (self.model.DoesNotExist, self.model.MultipleObjectsReturned) as e:
                return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs)
        else:
            return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs)
    
    def get_queryset(self):
        result = super(TaxonSearchListView, self).get_queryset()
        #
        query = self.request.GET.get('q')
        if query:            
            result = result.exclude(name_txt__icontains = 'sp.')
            result = result.annotate(similarity=TrigramSimilarity('name_txt', query)).filter(similarity__gt=0.3).order_by('-similarity')
        return result
Timberland answered 29/6, 2017 at 8:46 Comment(1)
Added the index with option with postgresql front end, didn't seem to change anything. Could it have to do with the way the query is made?Timberland
L
27

I found a 12/2020 article that uses the newest version of Django ORM as such:

class Author(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    class Meta:
        indexes = [
            GinIndex(
                name='review_author_ln_gin_idx', 
                fields=['last_name'], 
                opclasses=['gin_trgm_ops'],
            )
        ]

If, like the original poster, you were looking to create an index that works with icontains, you'll have to index the UPPER() of the column, which requires special handling from OpClass:

from django.db.models.functions import Upper
from django.contrib.postgres.indexes import GinIndex, OpClass

class Author(models.Model):
        indexes = [
            GinIndex(
                OpClass(Upper('last_name'), name='gin_trgm_ops'),
                name='review_author_ln_gin_idx',
            )
        ]

To use it, you need to add 'django.contrib.postgres' in your INSTALLED_APPS.


Inspired from an old article on this subject, I landed to a current one which gives the following solution for a GistIndex:

Update: From Django-1.11 things seem to be simpler, as this answer and django docs sugest:

from django.contrib.postgres.indexes import GinIndex

class MyModel(models.Model):
    the_field = models.CharField(max_length=512, db_index=True)

    class Meta:
        indexes = [GinIndex(fields=['the_field'])]

From Django-2.2, an attribute opclasses will be available in class Index(fields=(), name=None, db_tablespace=None, opclasses=()) for this purpose.


from django.contrib.postgres.indexes import GistIndex

class GistIndexTrgrmOps(GistIndex):
    def create_sql(self, model, schema_editor):
        # - this Statement is instantiated by the _create_index_sql()
        #   method of django.db.backends.base.schema.BaseDatabaseSchemaEditor.
        #   using sql_create_index template from
        #   django.db.backends.postgresql.schema.DatabaseSchemaEditor
        # - the template has original value:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s)%(extra)s"
        statement = super().create_sql(model, schema_editor)
        # - however, we want to use a GIST index to accelerate trigram
        #   matching, so we want to add the gist_trgm_ops index operator
        #   class
        # - so we replace the template with:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgrm_ops)%(extra)s"
        statement.template =\
            "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgm_ops)%(extra)s"

        return statement

Which you can then use in your model class like this:

class YourModel(models.Model):
    some_field = models.TextField(...)

    class Meta:
        indexes = [
            GistIndexTrgrmOps(fields=['some_field'])
        ]
Legislate answered 16/8, 2018 at 15:31 Comment(1)
Don't forget to add 'django.contrib.postgres' to the Django installed_apps before you run the migrationsAmblyoscope
R
12

I had a similar problem, trying to use the pg_tgrm extension to support efficient contains and icontains Django field lookups.

There may be a more elegant way, but defining a new index type like this worked for me:

from django.contrib.postgres.indexes import GinIndex

class TrigramIndex(GinIndex):
    def get_sql_create_template_values(self, model, schema_editor, using):
        fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders]
        tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields)
        quote_name = schema_editor.quote_name
        columns = [
            ('%s %s' % (quote_name(field.column), order)).strip() + ' gin_trgm_ops'
            for field, (field_name, order) in zip(fields, self.fields_orders)
        ]
        return {
            'table': quote_name(model._meta.db_table),
            'name': quote_name(self.name),
            'columns': ', '.join(columns),
            'using': using,
            'extra': tablespace_sql,
        }

The method get_sql_create_template_values is copied from Index.get_sql_create_template_values(), with just one modification: the addition of + ' gin_trgm_ops'.

For your use case, you would then define the index on name_txt using this TrigramIndex instead of a GinIndex. Then run makemigrations, which will produce a migration that generates the required CREATE INDEX SQL.

UPDATE:

I see you're also doing a query using icontains:

result.exclude(name_txt__icontains = 'sp.')

The Postgresql backend will turn that into something like this:

UPPER("NCBI_names"."name_txt"::text) LIKE UPPER('sp.')

and then the trigram index won't be used because of the UPPER().

I had the same problem, and ended up subclassing the database backend to work around it:

from django.db.backends.postgresql import base, operations

class DatabaseFeatures(base.DatabaseFeatures):
    pass

class DatabaseOperations(operations.DatabaseOperations):
    def lookup_cast(self, lookup_type, internal_type=None):
        lookup = '%s'

        # Cast text lookups to text to allow things like filter(x__contains=4)
        if lookup_type in ('iexact', 'contains', 'icontains', 'startswith',
                           'istartswith', 'endswith', 'iendswith', 'regex', 'iregex'):
            if internal_type in ('IPAddressField', 'GenericIPAddressField'):
                lookup = "HOST(%s)"
            else:
                lookup = "%s::text"

        return lookup


class DatabaseWrapper(base.DatabaseWrapper):
    """
        Override the defaults where needed to allow use of trigram index
    """
    ops_class = DatabaseOperations

    def __init__(self, *args, **kwargs):
        self.operators.update({
            'icontains': 'ILIKE %s',
            'istartswith': 'ILIKE %s',
            'iendswith': 'ILIKE %s',
        })
        self.pattern_ops.update({
            'icontains': "ILIKE '%%' || {} || '%%'",
            'istartswith': "ILIKE {} || '%%'",
            'iendswith': "ILIKE '%%' || {}",
        })
        super(DatabaseWrapper, self).__init__(*args, **kwargs)
Rama answered 7/7, 2017 at 5:8 Comment(1)
get_sql_create_template_values does not work in Django 3.Formative
C
7

To make Django 2.2 use the index for icontains and similar searches:

Subclass GinIndex to make an case insensitive index (uppercasing all field values):

from django.contrib.postgres.indexes import GinIndex

class UpperGinIndex(GinIndex):

    def create_sql(self, model, schema_editor, using=''):
        statement = super().create_sql(model, schema_editor, using=using)
        quote_name = statement.parts['columns'].quote_name

        def upper_quoted(column):
            return f'UPPER({quote_name(column)})'
        statement.parts['columns'].quote_name = upper_quoted
        return statement

Add the index to your model like this, including kwarg name which is required when using opclasses:

class MyModel(Model):
    name = TextField(...)

    class Meta:
        indexes = [
            UpperGinIndex(fields=['name'], name='mymodel_name_gintrgm', opclasses=['gin_trgm_ops'])
        ]

Generate the migration and edit the generated file:

# Generated by Django 2.2.3 on 2019-07-15 10:46
from django.contrib.postgres.operations import TrigramExtension  # <<< add this
from django.db import migrations
import myapp.models


class Migration(migrations.Migration):

    operations = [
        TrigramExtension(),   # <<< add this
        migrations.AddIndex(
            model_name='mymodel',
            index=myapp.models.UpperGinIndex(fields=['name'], name='mymodel_name_gintrgm', opclasses=['gin_trgm_ops']),
        ),
    ]
Calondra answered 15/7, 2019 at 11:26 Comment(0)
E
6

This already has an answer, but in Django 2.2 you can do this much easier:

class MyModel(models.Model):
   name = models.TextField()
   class Meta:
       indexes = [GistIndex(name="gist_trgm_idx", fields=("name",), opclasses=("gist_trgm_ops",))]

Alternatively you can use GinIndex.

Elea answered 4/12, 2019 at 7:52 Comment(2)
This is the "native" way of doing it since code.djangoproject.com/ticket/28077Formative
Will this be case insensitive, though?Calondra
K
5

In case someone want to have index on multiple columns joined (concatenated) with space you can use my modicitaion of built-in index.

Creates index like gin (("column1" || ' ' || "column2" || ' ' || ...) gin_trgm_ops)

class GinSpaceConcatIndex(GinIndex):

    def get_sql_create_template_values(self, model, schema_editor, using):

        fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders]
        tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields)
        quote_name = schema_editor.quote_name
        columns = [
            ('%s %s' % (quote_name(field.column), order)).strip()
            for field, (field_name, order) in zip(fields, self.fields_orders)
        ]
        return {
            'table': quote_name(model._meta.db_table),
            'name': quote_name(self.name),
            'columns': "({}) gin_trgm_ops".format(" || ' ' || ".join(columns)),
            'using': using,
            'extra': tablespace_sql,
        }
Klinger answered 18/12, 2017 at 7:46 Comment(1)
get_sql_create_template_values does not work in Django 3.Formative
S
1

Ensure that django.contrib.postgres was added to the INSTALLED_APPS

The source https://code.djangoproject.com/ticket/32770

Saarinen answered 16/5, 2023 at 8:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.