Get inverted index from SQLite FTS table
Asked Answered
M

0

7

After I have implemented a full text search function in my application using Sqlite and FTS tables I would be interested in a performant way of retrieving the FULL inverted index out of my FTS table. In effect - I would need a result table including a mapping between all terms -> docid's -> number of occurences.

Following the Sqlite FTS documentation - after the creation of the tables

-- Create an FTS4 table
CREATE VIRTUAL TABLE ft USING fts4(x, y);

-- Create an fts4aux table to access the full-text index for table "ft"
CREATE VIRTUAL TABLE ft_terms USING fts4aux(ft);

... and content insertion ...

INSERT INTO ft(x, y) VALUES('Apple banana', 'Cherry');
INSERT INTO ft(x, y) VALUES('Banana Date Date', 'cherry');
INSERT INTO ft(x, y) VALUES('Cherry Elderberry', 'Elderberry');

... instead of having just terms and number of occurences over all documents like in the FTS AUX table...

SELECT term, col, documents, occurrences FROM ft_terms;

--     apple       |  *  |  1  |  1
--     apple       |  0  |  1  |  1
--     banana      |  *  |  2  |  2
--     banana      |  0  |  2  |  2
--     cherry      |  *  |  3  |  3
--     cherry      |  0  |  1  |  1
--     cherry      |  1  |  2  |  2
--     date        |  *  |  1  |  2
--     date        |  0  |  1  |  2
--     elderberry  |  *  |  1  |  2
--     elderberry  |  1  |  1  |  1
--     elderberry  |  1  |  1  |  1

My result should look like the table below:

           Term    |col  |docid| occurences
------------------------------------------
--     apple       |  0  |  1  |  1
--     banana      |  0  |  2  |  1
--     cherry      |  0  |  3  |  1
--     cherry      |  1  |  1  |  1
--     cherry      |  1  |  2  |  1
--     date        |  0  |  2  |  2
--     elderberry  |  0  |  3  |  1
--     elderberry  |  1  |  3  |  1

I'm still not sure if a simple match query over all terms in the document collection is effective enough - maybe there is a more direct way?

Merriott answered 11/3, 2012 at 17:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.