Is SQLite on Android built with the ICU tokenizer enabled for FTS?
Asked Answered
C

3

7

Like the title says: can we use ...USING fts3(tokenizer icu th_TH, ...). If we can, does anyone know what locales are suported, and whether it varies by platform version?

Colson answered 15/8, 2011 at 20:9 Comment(0)
P
4

No, only tokenizer=porter

When I specify tokenizer=icu, I get "android.database.sqlite.SQLiteException: unknown tokenizer: icu"

Also, this link hints that if Android didn't compile it in by default, it will not be available http://sqlite.phxsoftware.com/forums/t/2349.aspx

Papotto answered 18/11, 2011 at 14:21 Comment(4)
Thanks. This confirms my suspicions. Too bad. :(Colson
I reported this issue more than 2 years ago code.google.com/p/android/issues/detail?id=9199 but Google keeps prioritizing live wallpapers with waves on touch, shutdown animations, insecure unlock methods (face unlock), ... over issues like this.Warring
@gregm, the link is dead.Selfconceit
The link that Eduardo posted indicates that ICU is available in Android 21 (Lollipop).Movement
B
1

For API Level 21 or up, I tested and found that ICU tokenizer is already available.

However to support 90%+ devices, some work-around can be made. I have a work-around idea, which is also mentioned in my another question: Work around of Android SQLite full-text search for Asian text

You may port the ICU tokenizer function into java, or a native Android module, as a separate module but not directly involved in SQLite. Then use the "external content table" to link to the virtual table (supported from FTS4).

When adding tuple, add normal content to external content table, but invoke the stand alone tokenzier to add artificial spaces to boundary of words before adding into the virtual index table.

When doing tuple delete, invoke the tokenzier again to update the content table with artificial spaces, then delete the virtual table tuple, then delete the content table tuple.

This is a little tricky, but comparing another option of re-compile a full SQLite, it is already much less effort.

For the external content table and how it works, please refer https://www.sqlite.org/fts3.html#section_6_2_2

The available ICU tokenizer is actually there in Android SDK. Use BreakIterator.getWordInstance. Looks like it even supports dictionary based tokenizer for languages such as Chinese. http://developer.android.com/reference/java/text/BreakIterator.html

Begot answered 21/10, 2015 at 18:25 Comment(0)
P
0

I have some Android code that uses tokenization in the link below, maybe it will of some help:

https://github.com/gast-lib/gast-lib/blob/master/app/src/root/gast/playground/speech/food/db/FtsIndexedFoodDatabase.java

Papotto answered 17/4, 2015 at 14:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.