After I have implemented a full text search function in my application using Sqlite and FTS tables I would be interested in a performant way of retrieving the FULL inverted index out of my FTS table. In effect - I would need a result table including a mapping between all terms -> docid's -> number of occurences.
Following the Sqlite FTS documentation - after the creation of the tables
-- Create an FTS4 table
CREATE VIRTUAL TABLE ft USING fts4(x, y);
-- Create an fts4aux table to access the full-text index for table "ft"
CREATE VIRTUAL TABLE ft_terms USING fts4aux(ft);
... and content insertion ...
INSERT INTO ft(x, y) VALUES('Apple banana', 'Cherry');
INSERT INTO ft(x, y) VALUES('Banana Date Date', 'cherry');
INSERT INTO ft(x, y) VALUES('Cherry Elderberry', 'Elderberry');
... instead of having just terms and number of occurences over all documents like in the FTS AUX table...
SELECT term, col, documents, occurrences FROM ft_terms;
-- apple | * | 1 | 1
-- apple | 0 | 1 | 1
-- banana | * | 2 | 2
-- banana | 0 | 2 | 2
-- cherry | * | 3 | 3
-- cherry | 0 | 1 | 1
-- cherry | 1 | 2 | 2
-- date | * | 1 | 2
-- date | 0 | 1 | 2
-- elderberry | * | 1 | 2
-- elderberry | 1 | 1 | 1
-- elderberry | 1 | 1 | 1
My result should look like the table below:
Term |col |docid| occurences
------------------------------------------
-- apple | 0 | 1 | 1
-- banana | 0 | 2 | 1
-- cherry | 0 | 3 | 1
-- cherry | 1 | 1 | 1
-- cherry | 1 | 2 | 1
-- date | 0 | 2 | 2
-- elderberry | 0 | 3 | 1
-- elderberry | 1 | 3 | 1
I'm still not sure if a simple match query over all terms in the document collection is effective enough - maybe there is a more direct way?