Does MySQL Regexp support Unicode matching
Asked Answered
S

3

5

Does anyone know if Mysql's regexp supports unicode? I've been doing some research and the majority of blogs etc. seem to indicate that there is a problem or its not supported. I'm wondering then is it best to use LIKE for unicode pattern matching and regexp for ASCII enhanced pattern matching?

I Like the idea of being able to search for matches at the beginning or end of a string, but if regexp doesn't support unicode then this could be difficult if my text is unicode.

Schlesinger answered 16/1, 2013 at 10:28 Comment(0)
P
7
  1. Does anyone know if Mysql's regexp supports unicode? I've been doing some research and the majority of blogs etc. seem to indicate that there is a problem or its not supported.

    As documented under Regular Expressions:

    Warning

    The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

  2. I'm wondering then is it best to use LIKE for unicode pattern matching and regexp for ASCII enhanced pattern matching?

    Yes, that would be best.

  3. I Like the idea of being able to search for matches at the beginning or end of a string, but if regexp doesn't support unicode then this could be difficult if my text is unicode.

    One can do that with LIKE too:

    WHERE foo LIKE 'bar%'
    

    And:

    WHERE foo LIKE '%bar'
    
Puttyroot answered 16/1, 2013 at 10:35 Comment(0)
D
3

MariaDB starting with 10.0.5 :

REGEXP/RLIKE, and the new functions REGEXP_REPLACE(), REGEXP_INSTR() and REGEXP_SUBSTR(), now work correctly with all multi-byte character sets supported by MariaDB, including East-Asian character sets (big5, gb2313, gbk, eucjp, eucjpms, cp932, ujis, euckr), and Unicode character sets (utf8, utf8mb4, ucs2, utf16, utf16le, utf32). In earlier versions of MariaDB (and all MySQL versions) REGEXP/RLIKE works correctly only with 8-bit character sets.

Dimetric answered 26/5, 2017 at 23:57 Comment(0)
T
0

Starting with Mysql 8.0, unicode matching is supported

See also the documentation for compatibility issues

Twocolor answered 25/10, 2019 at 8:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.