We are dealing with a strange bug in a Joyent Solaris server that never happened before (doesn't happen in localhost or two other Solaris servers with identical php configuration). Actually, I'm not sure if we have to look at php or solaris, and if it is a software or hardware problem...
I just want to post this in case somebody can point us in the right direction.
So, the problem seems to be in var_export()
when dealing with strange characters.
Executing this in the CLI, we get the expected result in our localhost machines and in two of the servers, but not in the 3rd one. All of them are configured to work with utf-8
.
$ php -r "echo var_export('ñu', true);"
Gives this in older servers and localhost (expected):
'ñu'
But in the server we are having problems with (PHP Version => 5.3.6), it adds \0
null characters whenever it encounters an "uncommon" character: è, á, ç, ... you name it.
'' . "\0" . '' . "\0" . 'u'
Any idea on where should be looking at? Thanks in advance.
More info:
PHP version 5.3.6
.setlocale()
is not solving anything.default_charset
isUTF-8
inphp.ini
.mbstring.internal_encoding
is set toUTF-8
inphp.ini
.mbstring.func_overload = 0
.- this happens in both CLI (example) and web application (php-fpm + nginx).
iconv
encoding is alsoUTF-8
- all files
utf-8
encoded.
system('locale')
returns:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
Some of the tests done so far (CLI):
Normal behaviour:
$ php -r "echo bin2hex('ñu');" => 'c3b175'
$ php -r "echo mb_strtoupper('ñu');" => 'ÑU'
$ php -r "echo serialize(\"\\xC3\\xB1\");" => 's:2:"ñ";'
$ php -r "echo bin2hex(addcslashes(b\"\\xC3\\xB1\", \"'\\\\\"));" => 'c3b1'
$ php -r "echo ucfirst('iñu');" => 'Iñu'
Not normal:
$ php -r "echo strtoupper('ñu');" => 'U'
$ php -r "echo ucfirst('ñu');" => '?u'
$ php -r "echo ucfirst(b\"\\xC3\\xB1u\");" => '?u'
$ php -r "echo bin2hex(ucfirst('ñu'));" => '00b175'
$ php -r "echo bin2hex(var_export('ñ', 1));" => '2727202e20225c3022202e202727202e20225c3022202e202727'
$ php -r "echo bin2hex(var_export(b\"\\xC3\\xB1\", 1));" => '2727202e20225c3022202e202727202e20225c3022202e202727'
So the problem seems to be in var_export()
and "string functions that use the current locale but operate byte-by-byte" Docs (view @hakre's answer).
locale(1)
and/or checking the environment variables that start withLC
. – BartNUL
bytes? – Agnosiaphp.ini
's default_charset isUTF-8
. – SwatterISO-8859-1
. I changed it toUTF-8
some days ago inphp.ini
but the problem remained, so I reverted it to the original configuration... – SwatterUTF-8
doesn't change a thing, bothvar_export()
anducfirst()
don't work. – Swattermb_detect_encoding()
– Giraldophp.ini
, andmbstring.internal_encoding
is set toUTF-8
.mb_detect_encoding('ñu')
returnsUTF-8
. – Swattermb_detect_encoding(ucfirst('ñu'))
? – Giraldomb_detect_encoding(ucfirst('ñu'))
returnsfalse
. – Swattermb_detect_encoding
is broken by design. Don't rely on that function, it's outcome does not say much. Handle with care. – Robandstrtoupper
must be working. Get proper binaries. – Roband