Work around of Android SQLite full-text search for Asian text
Asked Answered
L

2

0

I have read about many posts asking whether the SQLite based full-text search can be done in Android, and all the answers point out that the built-in SQLite of Android does not allow custom tokenizer. The default tokenizer considers the words separated by space or other signs, but Asian words (like Chinese) need its special tokenizer, but Android does not allow adding custom one.

The posts I read were years ago. Is there any update in recent Android versions? I just searched and did not find an answer.

And I am thinking a work-around. Is it feasible that before I INSERT tuples into the FTS3/FTS4 virtual table for indexing, I artificially add spaces between each word, so that the default tokenizer can consider each Asian "word" like an English word? When doing the query, the query string does the same, that artificial spaces are also added.

I tried in Linux, looks like it works. For example, if I do like this, full-text search is OK for Asian texts:

CREATE VIRTUAL TABLE mail USING fts3(subject, body);
INSERT INTO mail(docid, subject, body) VALUES(4, 'software feedback', '这 个 Bug 还 没 有 解 决');
SELECT * FROM mail WHERE body MATCH '没 有 解 决';  

But one doubt is that whether it would cost much more storage for the database file, as there are double of characters with the spaces. It looks like the so called "virtual table" not only stores the generated index, but also the original text.

Lambkin answered 21/10, 2015 at 6:4 Comment(1)
I just found FTS4 has the feature called "Contentless FTS4 Tables" and "External Content FTS4 Tables". Contentless table only stores the indexes but not the content, and external content table can store the content independent with the virtual table. So I think I can insert text with artificial blanks to the virtual table, and store the exact texts in the external content table. It sounds like a good work-around solution....Lambkin
L
1

For API Level 21 or up, I tested and found that ICU tokenizer is already available.

For older devices, I found a work-around solution in another question: Is SQLite on Android built with the ICU tokenizer enabled for FTS?

Lambkin answered 2/11, 2015 at 18:48 Comment(0)
H
-1

Use the NDK to compile your own copy of SQLite, with which you then can do whatever you want.

Hoodmanblind answered 21/10, 2015 at 7:29 Comment(1)
If you've never used the NDK before, then of course it's going to be difficult.Hoodmanblind

© 2022 - 2024 — McMap. All rights reserved.