FTS5 provides virtual tables that support full-text search. In other words you cannot create a full-text index on a column in an existing table. Instead you can create an FTS5 virtual table and copy relevant data from your original table over for indexing. In order to avoid storing the same data twice you can make it an external content table, though you will still have to make sure that the FTS5 table is kept in sync, either manually or with triggers.
You could create a generic custom DDL construct that'd handle creating a FTS5 virtual table that mirrors another table:
class CreateFtsTable(DDLElement):
"""Represents a CREATE VIRTUAL TABLE ... USING fts5 statement, for indexing
a given table.
"""
def __init__(self, table, version=5):
self.table = table
self.version = version
@compiles(CreateFtsTable)
def compile_create_fts_table(element, compiler, **kw):
"""
"""
tbl = element.table
version = element.version
preparer = compiler.preparer
sql_compiler = compiler.sql_compiler
tbl_name = preparer.format_table(tbl)
vtbl_name = preparer.quote(tbl.name + "_idx")
text = "\nCREATE VIRTUAL TABLE "
text += vtbl_name + " "
text += "USING fts" + str(version) + "("
separator = "\n"
pk_column, = tbl.primary_key
columns = [col for col in tbl.columns if col is not pk_column]
for column in columns:
text += separator
separator = ", \n"
text += "\t" + preparer.format_column(column)
if not isinstance(column.type, String):
text += " UNINDEXED"
text += separator
text += "\tcontent=" + sql_compiler.render_literal_value(
tbl.name, String())
text += separator
text += "\tcontent_rowid=" + sql_compiler.render_literal_value(
pk_column.name, String())
text += "\n)\n\n"
return text
The given implementation is a bit naive and indexes all text columns by default. The created virtual table is implicitly named by adding _idx
after the original table name.
But that alone is not enough, if you want to automate keeping the tables in sync with triggers, and since you're adding an index for just one table, you could just opt to use text DDL constructs in your migration script:
def upgrade():
ddl = [
"""
CREATE VIRTUAL TABLE person_idx USING fts5(
name,
thumb UNINDEXED,
content='person',
content_rowid='id'
)
""",
"""
CREATE TRIGGER person_ai AFTER INSERT ON person BEGIN
INSERT INTO person_idx (rowid, name, thumb)
VALUES (new.id, new.name, new.thumb);
END
""",
"""
CREATE TRIGGER person_ad AFTER DELETE ON person BEGIN
INSERT INTO person_idx (person_idx, rowid, name, thumb)
VALUES ('delete', old.id, old.name, old.thumb);
END
""",
"""
CREATE TRIGGER person_au AFTER UPDATE ON person BEGIN
INSERT INTO person_idx (person_idx, rowid, name, thumb)
VALUES ('delete', old.id, old.name, old.thumb);
INSERT INTO person_idx (rowid, name, thumb)
VALUES (new.id, new.name, new.thumb);
END
"""
]
for stmt in ddl:
op.execute(sa.DDL(stmt))
If your person table contains existing data, remember to insert those to the created virtual table as well for indexing.
In order to actually use the created virtual table you could create a non-primary mapper for Person
:
person_idx = db.Table('person_idx', db.metadata,
db.Column('rowid', db.Integer(), primary_key=True),
db.Column('name', db.Text()),
db.Column('thumb', db.Text()))
PersonIdx = db.mapper(
Person, person_idx, non_primary=True,
properties={
'id': person_idx.c.rowid
}
)
And to make a full-text query using for example MATCH:
db.session.query(PersonIdx).\
filter(PersonIdx.c.name.op("MATCH")("john")).\
all()
Note that the result is a list of Person
objects. PersonIdx
is just a Mapper
.
As noted by Victor K. the use of non-primary mappers is deprecated and the new alternative is to use aliased()
. The setup is mostly the same, but the rowid
to id
mapping needs to take place when creating person_idx
Table
using the key
parameter of Column
:
person_idx = db.Table('person_idx', db.metadata,
db.Column('rowid', db.Integer(), key='id', primary_key=True),
db.Column('name', db.Text()),
db.Column('thumb', db.Text()))
and instead of a new mapper create the alias:
PersonIdx = db.aliased(Person, person_idx, adapt_on_names=True)
The alias works more like the mapped class in that you do not access mapped attributes through .c
, but directly:
db.session.query(PersonIdx).\
filter(PersonIdx.name.op("MATCH")("john")).\
all()