How to use SQLAlchemy to create a full text search index on SQLite and query it?
Asked Answered
A

1

10

I am create a simple app which can performance basic operations. SQLite is used as database. I want to perform wildcard search but I know that it has poor performance. I want to try out full text search but I cannot full a example on how to do it. I confirmed that SQLite has full text search support. Here is my sample code.

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.Text, unique=True, nullable=False)
    thumb = db.Column(db.Text, nullable=False, default="")

    role = db.relationship("Role", backref="person", cascade="delete")


class Role(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    person_id = db.Column(db.Integer, db.ForeignKey(Person.id, ondelete="CASCADE"), nullable=False)
    role = db.Column(db.Text, nullable=False)

How can I create a FTS index and query it using SQLAlchemy. For example, searching name in Person.

Agential answered 18/4, 2018 at 7:44 Comment(3)
@IljaEverilä Running raw command is not idea. I am using alembic to maintain database schema.Agential
There's nothing stopping you from using raw SQL and alembic together.Lal
After a lot of twiddling, one solution could be to create an external content FTS table and map that as a non-primary mapper for your class. It's not pretty, but it works. Tried looking around for tools that'd automate that already, but did not find one yet.Lal
I
8

FTS5 provides virtual tables that support full-text search. In other words you cannot create a full-text index on a column in an existing table. Instead you can create an FTS5 virtual table and copy relevant data from your original table over for indexing. In order to avoid storing the same data twice you can make it an external content table, though you will still have to make sure that the FTS5 table is kept in sync, either manually or with triggers.

You could create a generic custom DDL construct that'd handle creating a FTS5 virtual table that mirrors another table:

class CreateFtsTable(DDLElement):
    """Represents a CREATE VIRTUAL TABLE ... USING fts5 statement, for indexing
    a given table.

    """

    def __init__(self, table, version=5):
        self.table = table
        self.version = version


@compiles(CreateFtsTable)
def compile_create_fts_table(element, compiler, **kw):
    """
    """
    tbl = element.table
    version = element.version
    preparer = compiler.preparer
    sql_compiler = compiler.sql_compiler

    tbl_name = preparer.format_table(tbl)
    vtbl_name = preparer.quote(tbl.name + "_idx")

    text = "\nCREATE VIRTUAL TABLE "
    text += vtbl_name + " "
    text += "USING fts" + str(version) + "("

    separator = "\n"

    pk_column, = tbl.primary_key
    columns = [col for col in tbl.columns if col is not pk_column]

    for column in columns:
        text += separator
        separator = ", \n"
        text += "\t" + preparer.format_column(column)

        if not isinstance(column.type, String):
            text += " UNINDEXED"

    text += separator
    text += "\tcontent=" + sql_compiler.render_literal_value(
            tbl.name, String())

    text += separator
    text += "\tcontent_rowid=" + sql_compiler.render_literal_value(
            pk_column.name, String())

    text += "\n)\n\n"
    return text

The given implementation is a bit naive and indexes all text columns by default. The created virtual table is implicitly named by adding _idx after the original table name.

But that alone is not enough, if you want to automate keeping the tables in sync with triggers, and since you're adding an index for just one table, you could just opt to use text DDL constructs in your migration script:

def upgrade():
    ddl = [
        """
        CREATE VIRTUAL TABLE person_idx USING fts5(
            name,
            thumb UNINDEXED,
            content='person',
            content_rowid='id'
        )
        """,
        """
        CREATE TRIGGER person_ai AFTER INSERT ON person BEGIN
            INSERT INTO person_idx (rowid, name, thumb)
            VALUES (new.id, new.name, new.thumb);
        END
        """,
        """
        CREATE TRIGGER person_ad AFTER DELETE ON person BEGIN
            INSERT INTO person_idx (person_idx, rowid, name, thumb)
            VALUES ('delete', old.id, old.name, old.thumb);
        END
        """,
        """
        CREATE TRIGGER person_au AFTER UPDATE ON person BEGIN
            INSERT INTO person_idx (person_idx, rowid, name, thumb)
            VALUES ('delete', old.id, old.name, old.thumb);
            INSERT INTO person_idx (rowid, name, thumb)
            VALUES (new.id, new.name, new.thumb);
        END
        """
    ]

    for stmt in ddl:
        op.execute(sa.DDL(stmt))

If your person table contains existing data, remember to insert those to the created virtual table as well for indexing.

In order to actually use the created virtual table you could create a non-primary mapper for Person:

person_idx = db.Table('person_idx', db.metadata,
                      db.Column('rowid', db.Integer(), primary_key=True),
                      db.Column('name', db.Text()),
                      db.Column('thumb', db.Text()))

PersonIdx = db.mapper(
    Person, person_idx, non_primary=True,
    properties={
        'id': person_idx.c.rowid
    }
)

And to make a full-text query using for example MATCH:

db.session.query(PersonIdx).\
    filter(PersonIdx.c.name.op("MATCH")("john")).\
    all()

Note that the result is a list of Person objects. PersonIdx is just a Mapper.


As noted by Victor K. the use of non-primary mappers is deprecated and the new alternative is to use aliased(). The setup is mostly the same, but the rowid to id mapping needs to take place when creating person_idx Table using the key parameter of Column:

person_idx = db.Table('person_idx', db.metadata,
                      db.Column('rowid', db.Integer(), key='id', primary_key=True),
                      db.Column('name', db.Text()),
                      db.Column('thumb', db.Text()))

and instead of a new mapper create the alias:

PersonIdx = db.aliased(Person, person_idx, adapt_on_names=True)

The alias works more like the mapped class in that you do not access mapped attributes through .c, but directly:

db.session.query(PersonIdx).\
    filter(PersonIdx.name.op("MATCH")("john")).\
    all()
Incorruptible answered 19/4, 2018 at 9:46 Comment(11)
Given that non-primary mappers are deprecated as of SQLAlchemy 1.3 according to this, is there a modern way to accomplish the same?Flessel
Perhaps something using aliased will work, would have to check. The possibly problematic field would be rowid (vs. your pk), but perhaps that could be controlled while creating the FTS vtable.Lal
@VictorK. Yep, using aliased() works, given that you pass key='id' to the rowid column when creating person_idx, and adapt_on_names=True to aliased().Lal
@IljaEverilä how can I do a MATCH on multiple columns at once?Tersanctus
Use the usual boolean operators. Depending on your use case you could also just concatenate multiple text columns into one index column.Lal
@Tersanctus Additionally you can also query all columns in the virtual table at once by specifying the table name as the column name, like column("person_idx").op("MATCH")("john").Lal
Nice trick with the table name as the column name, thanks!Tersanctus
sqlite.org/fts5.html#full_text_query_syntax is worth the read.Lal
Could anyone clarify why we need a table alias here? Is it so that we select from the FTS table, while retaining the "behaviour" in python of the content table?Lowrance
You are spot on.Lal
A word of caution with the aliased approach: it seems to be a dangerous footgun, where if your FTS table is not created yet, passing in the Base metadata into the Table init will have SA try to create your FTS table itself, which will cause obscure errors later. I'm not sure what the best solution is here, as I'm not completely experienced with what the metadata arg does even - my understanding is that it associates a collection of tables together, but what I did was set a flag on the FTS Table to make sure SQLAlchemy didn't try to create it.Lowrance

© 2022 - 2024 — McMap. All rights reserved.