strtolower() for unicode/multibyte strings
Asked Answered
T

8

35

I have some text in a non-English/foreign language in my page, but when I try to make it lowercase, it characters are converted into black diamonds containing question marks.

$a = "Երկիր Ավելացնել";
echo $b = strtolower($a);
//returns  ����� ���������

I've set my charset in a metatag, but this didn't fix it.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

What can I do to convert my string to lowercase without corrupting it?

Tribute answered 25/3, 2010 at 14:46 Comment(3)
The Unicode monster strikes again! Here, have a link: joelonsoftware.com/articles/Unicode.htmlToxic
Is there even such a thing as lower case arabic or whatever that is? :PSubcritical
Make sure to send Content-Type header as well, in some browsers it takes priority over the meta tag.Xiaoximena
B
77

Have you tried using mb_strtolower()?

Bridewell answered 25/3, 2010 at 14:47 Comment(1)
var_dump(mb_strtolower('ԱԱԱ', mb_detect_encoding('ԱԱԱ'))); // string(6) "աաա" 100% Working!!!!Triiodomethane
X
22

PHP5 is not UTF-8 compatible, so you still need to resort to the mb extension. I suggest you set the internal encoding of mb to utf-8 and then you can freely use its functions without specifying the charset all the time:

mb_internal_encoding('UTF-8');

...

$b = mb_strtolower($a);
echo $b;
Xiaoximena answered 25/3, 2010 at 14:49 Comment(0)
U
10

i have found this solution from here

$string = 'Թ';
echo 'Uppercase: '.mb_convert_case($string, MB_CASE_UPPER, "UTF-8").'';
echo 'Lowercase: '.mb_convert_case($string, MB_CASE_LOWER, "UTF-8").'';
echo 'Original: '.$string.'';

works for me (lower case)

Underground answered 26/7, 2012 at 11:53 Comment(0)
G
5

Have you tried mb_strtolower() and specifying the encoding as the second parameter?

The examples on that page appear to work.

You could also try:

$str = mb_strtolower($str, mb_detect_encoding($str));
Galvanic answered 25/3, 2010 at 14:48 Comment(0)
P
3

Php by default does not know about utf-8. It assumes any string is ASCII, so it strtolower converts bytes containing codes of uppercase letters A-Z to codes of lowercase a-z. As the UTF-8 non-ascii letters are written with two or more bytes, the strtolower converts each byte separately, and if the byte happens to contain code equal to letters A-Z, it is converted. In the result the sequence is broken, and it no longer represents correct character.

To change this you need to configure the mbstring extension:

http://www.php.net/manual/en/book.mbstring.php

to replace strtolower with mb_strtolower or use mb_strtolower direclty. I any case, you need to spend some time to configure the mbstring settings to match your requirements.

Prong answered 25/3, 2010 at 14:53 Comment(0)
O
2

Use mb_strtolower instead, as strtolower doesn't work on multi-byte characters.

Ostensorium answered 25/3, 2010 at 14:48 Comment(1)
strtolower does actually work on multibyte characters, it just works off of the current locale, which is not usually what you want in these cases.Justiciar
O
1

strtolower() will perform the conversion in the currently selected locale only.

I would try mb_convert_case(). Make sure you explicitly specify an encoding.

Osvaldooswal answered 25/3, 2010 at 14:49 Comment(0)
H
0

You will need to set the locale; see the first example at https://www.php.net/manual/en/function.strtolower.php

Haemoid answered 25/3, 2010 at 14:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.