How to best configure PHP to handle a UTF-8 website [duplicate]
Asked Answered
T

6

20

What extensions would you recommend and how should php be best configured to create a website that uses utf-8 encoding for everything. eg...

  • Page output is utf-8
  • forms submit data encoded in utf-8
  • internal processing of string data (eg when talking to a database) are all in utf-8 as well.

It seems that php does not really cope well with multibyte character sets at the moment. So far I have worked out that mbstring looks like an important extension.

Is it worth the hassle..?

Tallith answered 22/10, 2009 at 8:29 Comment(3)
I've successfully been using standard PHP installations with UTF-8 source files generating UTF-8 output including special UTF-8 chars like ♕ ⚐ and ✔ since 4.1.x. :)Donielle
Getting correct UTF-8 output doesn't prove that your code is parsing input correctly and secured against malicious sequences.Capelin
Update Throughout this Q&A, consider using utf8mb4 in MySQL instead of utf8. (Contrast, the non-MySQL term UTF-8.)Responsive
E
58

The supposed issues of PHP with Unicode content have been somewhat overstated. I've been doing multilingual websites since 1998 and never knew there might be an issue until I've read about it somewhere - many years and websites later.

This works just fine for me:

Apache configuration (in httpd.conf or .htaccess)

AddDefaultCharset utf-8

PHP (in php.ini)

default_charset = "utf-8"
mbstring.internal_encoding=utf-8
mbstring.http_output=UTF-8
mbstring.encoding_translation=On
mbstring.func_overload=6 

MySQL

CREATE your database with an utf8_* collation, let the tables inherit the database collation and start every connection with "SET NAMES utf8"

HTML (in HEAD element)

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Ers answered 22/10, 2009 at 22:55 Comment(12)
What does the "SET NAMES utf8" SQL statement actually do?Tallith
Straight from the MySQL docs: " A SET NAMES 'x' statement is equivalent to these three statements: SET character_set_client = x; SET character_set_results = x; SET character_set_connection = x;" This is handy because no matter which charset you use to store the data, the data still has to travel to and from PHP. One might never notice a problem while using a single computer (as in HTML FORM -> MySQL -> page), but using a devel machine to populate a db and moving it to the prod server to output it is risky, as the two may well have different client charsets. SET NAMES means portability.Ers
Can you still use PHP's string functions or you have to use the mb_ ones ?Distaste
Here's how I created my database: CREATE DATABASE <DBNAME> CHARACTER SET utf8 COLLATE utf8_general_ci;Maledict
Do not use set names because it doesn't update the charset used for real_escape_string. See #1317652Capelin
@Ers If I could, I'd give you multiple +1s! Thanks!Seligman
@Ers Can you please explain what does mbstring.func_overload=6 do? I couldn't find the value 6 in here: php.net/manual/en/mbstring.overload.phpEasy
What is mbstring.func_overload=6? 6 isn't even listed as an option.Malaria
mbstring.func_overload = 6 is ``mbstring.func_overload = 4` and ` mbstring.func_overload = 2 combined, because the 1, 2, 4 options are bitmasks .... quoting from the PHP Docs that you linked, To use function overloading, set mbstring.func_overload in php.ini to a positive value that represents a combination of bitmasks specifying the categories of functions to be overloaded., and then proceeds to give several examples of combinationsPedo
Yes! it works :) Make sure you set UTF-8 everywhere. HTML, PHP, MYSQL..etc. Thanks for answer.. I am going to add my answer for Codeigniter..Dwarf
utf8mb4 for MySQL, pleaseMellophone
mbstring.func_overload=6 has been deprecated.Philoctetes
D
4

I was facing same issue for UTF-8 characters, Everything was working on live server and staging server, but sometime it's breaking on my dev machine. The behavior was so strange, some times characters was encoded properly but on random page reload it was start breaking with Diamond Charters '���เห็นอเวิลด์!���' or Question mark '??�เห็นอเวิลด์!???' or 85% data was rendering properly 'เห็นอเวิลด์!?��' but rest 15% was showing unmatched characters. I was looking to fix the issue. So, started with my checklist

1 - Check if Character Header Added in HTML


2 - Check if data proper saved in MySQL table


3 - Check if MySQL has proper encoding settings for UTF-8


4 - Check if Apache has Setting to deal with UTF-8 Character set


5 - Check if simple PHP can echo "เห็นอเวิลด์" output same as input "เห็นอเวิลด์"


6 - Check if PHP sending proper Headers output


7 - Check if MySQL Query getting same data "เห็นอเวิลด์"


8 - Check if "เห็นอเวิลด์" has some html characters, deal with them properly


9 - Check if "เห็นอเวิลด์" passing through any html encode decode function


10- Check if .htaccess all set to deal with UTF-8 Character set


Check all the above list to figure out where something..breaking.

Give a try (I am using Codeigniter):

=================================
:: PHP ini Settings::
=================================

default_charset = "utf-8"
mbstring.internal_encoding=utf-8
mbstring.http_output=UTF-8
mbstring.encoding_translation=On
mbstring.func_overload=6 

=================================
:: .htaccess Settings::
=================================

DefaultLanguage en-US
AddDefaultCharset UTF-8

=================================
:: HTML Header Page::
=================================

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

=================================
:: PHP Codeigniter index.php ::
=================================

header('Content-Type: text/html; charset=UTF-8');

=================================
:: Codeigniter config.php ::
=================================

$config['charset'] = 'UTF-8';

=================================
:: Codeigniter database.php ::
=================================

$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';

=================================
:: Codeigniter helper function (optional)
=================================

if(!function_exists('safe_utf_string')){
    function safe_utf_string($utf8string= ''){
        $utf8string = htmlspecialchars($utf8string, ENT_QUOTES, 'UTF-8');
        return mb_convert_encoding($utf8string, 'UTF-8');
    }
}

and Finally don't forget to say Thanks! :) to @djn answer

Dwarf answered 26/5, 2017 at 11:26 Comment(1)
You may need utf8mb4 instead of utf8 in MySQL. Can you provide the hex for the characters that became black diamonds? Or the characters that they should have been there? When the hex is 4 bytes: F0xxyyzz, utf8 will not suffice; utf8mb4 is required.Responsive
S
2

php copes just fine!

You should set the php.ini "default_charset" parameter to 'utf-8'.

The make sure that:-

<head>
  <meta http-equiv="Content-Type"
    content="text/html; charset=utf-8"
    />

is at the top of every page you serve.

There are a few problem areas:

Databases -- make sure they are configured to use utf-8 by default or enter a world of pain.

IDEs/Editors -- a lot of editors don't support utf-8 well. I normally use vim which doesn't but its never been a big problem.

Documents -- just spent a whole afternoon getting php to read Thai characters out of a spreadsheet. I was eventually successful but am still not sure what I did right.

Seneca answered 22/10, 2009 at 8:40 Comment(0)
C
2

Kindly note that these php.ini entries are DEPRECATED;

;mbstring.internal_encoding = utf-8
;mbstring.http_input =
;mbstring.http_output = utf-8

Next ...

PHP - Set utf8 for the following - via a config.php file for your web app

 ini_set('default_charset', 'UTF-8');                                    
 mb_internal_encoding('UTF-8');
 iconv_set_encoding('internal_encoding', 'UTF-8');
 iconv_set_encoding('output_encoding', 'UTF-8');

MariaDB / MySQL - Set utf8 via:

 mysqli::set_charset ( "utf8mb4" );

HTML Pages - Set via:

 <meta charset="utf-8" > 
  
Chaffin answered 21/8, 2018 at 14:44 Comment(0)
B
1

If mbstring isn't already part of your PHP package, then I definitely would recommend it to you - you'll even want to use it for calculationg string lengths ( mb_strlen($string_var, 'utf8') ) for form input... Else you won't need anything except valid and proper HTML, a correct http-server-config (so the server will deliver pages unsing utf-8) and a text editor with utf-8-support (e.g. Notepad++).

Buchanan answered 22/10, 2009 at 8:35 Comment(0)
D
1

In your php.ini, set

mbstring.internal_encoding = UTF-8
mbstring.encoding_translation = On

so that you don't need to pass an encoding parameter to the mb_ functions every time.

Dionysian answered 22/10, 2009 at 8:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.