How to deal with accented characters in iOS SQLite?
Asked Answered
B

3

12

I need to perform a SELECT queries that are insensitive to case and accents. For demo purposes, I create a table like that:

create table table
(
  column text collate nocase
);

insert into table values ('A');
insert into table values ('a');
insert into table values ('Á');
insert into table values ('á');

create index table_cloumn_Index
  on table (column collate nocase);

Then, I get those results when executing the following queries:

SELECT * FROM table WHERE column LIKE 'a';
> A
> a

SELECT * FROM table WHERE column LIKE 'á';
> á

SELECT * FROM table WHERE column LIKE 'Á';
> Á

How can I fix that so the results for any of the following queries be like that:

> A
> a
> Á
> á

The sqlite is running on iOS, by the way.

Thanks in advance,

Barrows answered 27/4, 2012 at 18:39 Comment(0)
H
18

Two basic approaches:

  1. You can create a second column in the table which contains the string without the international characters. Furthermore, before doing a search against this secondary search column, you should also remove international characters from the string being search for, too (that way you are comparing non-international to non-international).

    This is the routine I use to convert the international characters:

    NSData *data = [string dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
    string = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
    

    You could also replace the accented characters with:

    NSMutableString *mutableString = [string mutableCopy];
    CFStringTransform((__bridge CFMutableStringRef)mutableString, NULL, kCFStringTransformStripCombiningMarks, NO);
    

    By the way, if you need to sort your results, you can also sort upon this secondary search field instead of the main field, which will avoid problems stemming from SQLite's inability to sort the international characters, either.

  2. You can alternatively create your own "unaccented" C function (define this C function outside the @implementation for your class):

    void unaccented(sqlite3_context *context, int argc, sqlite3_value **argv)
    {
        if (argc != 1 || sqlite3_value_type(argv[0]) != SQLITE_TEXT) {
            sqlite3_result_null(context);
            return;
        }
    
        @autoreleasepool {
            NSMutableString *string = [NSMutableString stringWithUTF8String:(const char *)sqlite3_value_text(argv[0])];
            CFStringTransform((__bridge CFMutableStringRef)string, NULL, kCFStringTransformStripCombiningMarks, NO);
            sqlite3_result_text(context, [string UTF8String], -1, SQLITE_TRANSIENT);
        }
    }
    

    You can then define a SQLite function that will call this C-function (call this method after you open the database, which will be effective until you close that database):

    - (void)createUnaccentedFunction
    {
        if (sqlite3_create_function_v2(database, "unaccented", 1, SQLITE_ANY, NULL, &unaccented, NULL, NULL, NULL) != SQLITE_OK)
            NSLog(@"%s: sqlite3_create_function_v2 error: %s", __FUNCTION__, sqlite3_errmsg(database));
    }
    

    Having done that, you can now use this new unaccented function in SQL, e.g.:

    if (sqlite3_prepare_v2(database, "select a from table where unaccented(column) like 'a'", -1, &statement, NULL) != SQLITE_OK)
        NSLog(@"%s: insert 1: %s", __FUNCTION__, sqlite3_errmsg(database));
    
Hereon answered 27/4, 2012 at 19:18 Comment(7)
Yeah, I was using this approach too, but now I need to deal with a database which I cannot change it's structure. But if you can, and your db isn't too huge, it's a easy way to go.Barrows
@LeandroAlves I know you've long since moved past this issue, but I (a) include another variation for unaccenting strings; and (b) show you how to unaccent strings in SQLite on the fly.Hereon
@Hereon The code for Option 2 executes and handles the data correctly, but takes the CPU to 99% and Memory > 1000 MB. The application crashes with EXC_BAD_ACCESS (code=2, address=0x0) immediately after the operations complete. Is there anything that can be done to control memory usage with this code?Septuagenarian
@ChrisSchiffhauer This will require some discussion, so I created a chat room for this follow-up question: chat.stackoverflow.com/rooms/41329/…Hereon
@ChrisSchiffhauer The memory problems that arise by doing this update on many, many records can be resolved by introducing of an @autoreleasepool. See revised answer.Hereon
Wow, if ever there was an answer that deserved a bounty, it's this one. Incoming in 24 hours.Septuagenarian
Credit to LS_dev for suggestion to use SQLITE_TRANSIENT. Much better.Hereon
M
4

You will need to either create some user function, or override (i.e. replace) the default implementation of the like() functions. The reason is that the LIKE operator in sqlite doesn't support non-ASCII case-insensitiveness:

SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.

This makes sense otherwise sqlite would need to support different cultures since case varies from one to the other. An example is the capital i in Turkey which is not I but a dotted İ, and the lower-case of I is a dot-less ı. Embedding all this culture information in sqlite would be very burdensome (i.e. it would increase sqlite object code).

Middaugh answered 29/4, 2012 at 10:21 Comment(1)
Did you already created any user function on iOS? Do you have any sample code about that? This seems to be the righter way of doing this.Barrows
D
2

Here is my solution of LIKE problem

static void myLow(sqlite3_context *context, int argc, sqlite3_value **argv)
{
    NSString* str = [[NSString alloc] initWithUTF8String:
                            (const char *)sqlite3_value_text(argv[0])];
    const char* s = [[str lowercaseString] UTF8String];
    sqlite3_result_text(context, s, strlen(s), NULL);
    [str release];
}

// call it once after opening db
sqlite3_create_function(_db, "myLow", 1, SQLITE_UTF8,NULL, &myLow, NULL, NULL);

And then instead of query

SELECT * FROM table WHERE column LIKE 'a'

you should use

SELECT * FROM table WHERE myLow(column) LIKE 'a'
Dalury answered 9/1, 2015 at 20:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.