Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?
Asked Answered
P

8

217

I have a PHP script that deals with a wide variety of languages. Unfortunately, whenever I try to use json_encode, any Unicode output is converted to hexadecimal entities. Is this the expected behavior? Is there any way to convert the output to UTF-8 characters?

Here's an example of what I'm seeing:

INPUT

echo $text;

OUTPUT

База данни грешка.

INPUT

json_encode($text);

OUTPUT

"\u0411\u0430\u0437\u0430 \u0434\u0430\u043d\u043d\u0438 \u0433\u0440\u0435\u0448\u043a\u0430."
Phosphocreatine answered 11/5, 2013 at 14:44 Comment(0)
H
496

Since PHP/5.4.0, there is an option called JSON_UNESCAPED_UNICODE. Check it out:

https://php.net/function.json-encode

Therefore you should try:

json_encode( $text, JSON_UNESCAPED_UNICODE );
Heterologous answered 11/5, 2013 at 14:46 Comment(3)
Aha. Thanks! I should have read the documentation more carefully. Thanks.Phosphocreatine
JSON_UNESCAPED_UNICODE was introduced in PHP 5.4.0, and is unavailable in earlier versions. When using it in earlier versions you will get this error: "Warning: json_encode() expects parameter 2 to be long, string given in ...". See CertaiN's answer below for 5.3 solution.Lindsay
This also works with Danish letters Æ,æ,Ø,ø,Å,å Thank you!Crazed
S
64

JSON_UNESCAPED_UNICODE is available on PHP Version 5.4 or later.
The following code is for Version 5.3.

UPDATED

  • html_entity_decode is a bit more efficient than pack + mb_convert_encoding.
  • (*SKIP)(*FAIL) skips backslashes itself and specified characters by JSON_HEX_* flags.

 

function raw_json_encode($input, $flags = 0) {
    $fails = implode('|', array_filter(array(
        '\\\\',
        $flags & JSON_HEX_TAG ? 'u003[CE]' : '',
        $flags & JSON_HEX_AMP ? 'u0026' : '',
        $flags & JSON_HEX_APOS ? 'u0027' : '',
        $flags & JSON_HEX_QUOT ? 'u0022' : '',
    )));
    $pattern = "/\\\\(?:(?:$fails)(*SKIP)(*FAIL)|u([0-9a-fA-F]{4}))/";
    $callback = function ($m) {
        return html_entity_decode("&#x$m[1];", ENT_QUOTES, 'UTF-8');
    };
    return preg_replace_callback($pattern, $callback, json_encode($input, $flags));
}
Soekarno answered 11/5, 2013 at 15:26 Comment(5)
Shouldn't the \u be \U i.e. uppercase?Transgress
Nice solution for PHP < 5.4 ;)Quinonoid
I was looking for 3 days to find this solution for Version 5.3 as my host didn't upgrade to 5.4. To me you are a Life saver and for being so complete I would rather mark this as accepted answer!Woodwork
Fixed bug when string contains \\ . Newer version grabs \\ on higher priority than \u.Soekarno
This should be added in the php library. Good Job.Davide
N
15

You like to set charset and unescaped unicode

 header('Content-Type: application/json;charset=utf-8');  
 json_encode($data,JSON_UNESCAPED_UNICODE|JSON_PRETTY_PRINT);
Nigh answered 25/9, 2018 at 17:11 Comment(0)
E
6

One solution is to first encode data and then decode it in the same file:

$string =json_encode($input, JSON_UNESCAPED_UNICODE) ; 
echo $decoded = html_entity_decode( $string );
Earwax answered 3/5, 2018 at 21:45 Comment(0)
H
5

Here is my combined solution for various PHP versions.

In my company we are working with different servers with various PHP versions, so I had to find solution working for all.

$phpVersion = substr(phpversion(), 0, 3)*1;

if($phpVersion >= 5.4) {
  $encodedValue = json_encode($value, JSON_UNESCAPED_UNICODE);
} else {
  $encodedValue = preg_replace('/\\\\u([a-f0-9]{4})/e', "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", json_encode($value));
}

Credits should go to Marco Gasi & abu. The solution for PHP >= 5.4 is provided in the json_encode docs.

Hormonal answered 29/4, 2019 at 12:56 Comment(0)
J
1
json_encode($text, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
Jaquelynjaquenetta answered 2/4, 2019 at 17:30 Comment(0)
F
-2

The raw_json_encode() function above did not solve me the problem (for some reason, the callback function raised an error on my PHP 5.2.5 server).

But this other solution did actually work.

https://www.experts-exchange.com/questions/28628085/json-encode-fails-with-special-characters.html

Credits should go to Marco Gasi. I just call his function instead of calling json_encode():

function jsonRemoveUnicodeSequences( $json_struct )
{ 
    return preg_replace( "/\\\\u([a-f0-9]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", json_encode( $json_struct ) );
}
Faizabad answered 18/1, 2019 at 17:8 Comment(0)
M
-6

Is this the expected behavior?

the json_encode() only works with UTF-8 encoded data.

maybe you can get an answer to convert it here: cyrillic-characters-in-phps-json-encode

Megohm answered 11/5, 2013 at 15:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.