MySQL collation for all languages
Asked Answered
H

5

26

I'm currently developing a website that is going to show stuff for almost any language in the world. And I'm having problems choosing the best collation to define in the MySQL.

Which one is the best to support all characters? Or the most accurate?

Or is just best to convert all characters to unicode?

Holography answered 20/9, 2009 at 11:40 Comment(0)
C
23

I generally use 8-bit UCS/Unicode transformation format which works perfect for any (well most) languages

utf8_general_ci

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

Cobden answered 20/9, 2009 at 12:8 Comment(1)
I'd like to suggest using utf8_unicode_ci instead of utf8_general_ci. For more information about why unicode is better than general @ #767309Emelyemelyne
B
36

The accepted answer is wrong (maybe it was right in 2009).

utf8mb4_unicode_ci is the best encoding to use for wide language support.

Reasoning and supporting evidence:

You want to use utf8mb4 rather than utf8 because the latter only supports 3 byte characters, and you want to support 4 byte characters. (ref)

and

You want to use unicode rather than general because the latter never sorted correctly. (ref)

Boarer answered 7/3, 2019 at 16:1 Comment(1)
Thanks! But what is the disadvantage of doing this by default for every db / table? Does it use more space or will it make my queries / searching inefficient compared to using the default mysql setting (latin1 i guess)Signorino
C
23

I generally use 8-bit UCS/Unicode transformation format which works perfect for any (well most) languages

utf8_general_ci

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

Cobden answered 20/9, 2009 at 12:8 Comment(1)
I'd like to suggest using utf8_unicode_ci instead of utf8_general_ci. For more information about why unicode is better than general @ #767309Emelyemelyne
B
0

Use utf8mb4 instead of utf8

utf8mb4_general_ci => support 1, 2, 3 or 4 bytes

and

utf8_general_ci or utf8mb3_general_ci => support 1, 2 or 3 bytes

It will take space on ur disk as required.

Bijouterie answered 3/6, 2021 at 9:11 Comment(0)
I
0

Use utf8mb4_unicode_ci or utf8mb4_general_ci can be tricky and cause unexpected behaviors.

Be aware.

Perhaps utf8mb4_unicode_bin can be a good option if you want to avoid cases like this one below.

enter image description here

Istanbul answered 30/7, 2021 at 9:59 Comment(0)
S
0

From mysql web site :

utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.

utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.

utf8: An alias for utf8mb3. In MySQL 8.0, this alias is deprecated; use utf8mb4 instead. utf8 is expected in a future release to become an alias for utf8mb4.

So prefer to use utf8mb4

Satinet answered 16/9, 2022 at 5:50 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Calfskin

© 2022 - 2025 — McMap. All rights reserved.