- What does this do exactly? How is the behaviour of the script affected by this directive?
From php.ini:
; Allows to set the default encoding for the scripts. This value will be used
; unless "declare(encoding=...)" directive appears at the top of the script.
; Only affects if zend.multibyte is set.
; Default: ""
;zend.script_encoding =
From php.net:
handled as the file is being compiled....
A script's encoding can be specified per-script using the encoding directive.
In other words if the zend.multibyte directive is set, an optional declare directive at the top of each PHP file can be used to declare each file's character encoding. This means you can have each of your PHP files in different encodings as long as you declare their encodings at the top of each PHP file, and the string literals contained in each of the files will be transparently converted at compile time to the internal_encoding set in php.ini (tested in PHP 7.4.6). The default_charset and internal_encoding configuration options are not changed and your code is unaware of the original encodings since the conversions have taken place at compile time.
- How does this differ from setting the directives mbstring.internal_encoding (before PHP 5.6) and default_charset (as of PHP 5.6) or using the mb_internal_encoding() function?
internal_encoding directive (formerly mbstring.internal_encoding)
The declared character encoding at the top of each file is the actual encoding of said file, while the internal_encoding setting in php.ini is the desired character encoding. So if you want your code to see UTF-8 but your PHP files are saved in Windows-1252, you could set your internal_encoding in php.ini to UTF-8 while putting a declare directive at the top of each of your files stating that they are encoded as Windows-1252 and the string literals contained within them will be converted to UTF-8 at compile time. (Tested in PHP 7.4.6)
php.net:
This setting is used for multibyte modules such as mbstring and iconv.
php.ini:
If empty, default_charset is used.
For more information see mb_internal_encoding() function below
mb_internal_encoding function
Setting mb_internal_encoding at run time tells your mb_* functions what multibyte encoding you are using so that calls to functions like mb_strtolower will be able to recognize your multibyte characters so that they can substitute them with their lowercase equivalents. If you don't set this at runtime it will assume the encoding set in the internal_encoding directive in php.ini.
The mb_internal_encoding function executes at runtime and therefore can't be used to tell PHP what each PHP file's declared encoding should be converted to at compile time. (See above.)
From PHP.net:
[Set/Get] the character encoding name used for the HTTP input character encoding conversion, HTTP output character encoding conversion, and the default character encoding for string functions defined by the mbstring module. You should notice that the internal encoding is totally different from the one for multibyte regex.
default_charset directive
Setting the default_charset directive tells PHP what value to use in the content-type HTTP response header. For example content-type: text/html; charset=UTF-8
This directive also tells PHP what character encoding to look for in certain functions such as htmlspecialchars and htmlentities. For example if your default_charset is UTF-8 but your database is set to use latin1 then htmlspecialchars will have trouble with non-ascii characters if Windows-1252 is not specified as the encoding because Windows-1252 contains byte sequences that are considered invalid in UTF-8. It's also used as the internal_encoding if the internal_encoding is not explicitly set.
From php.net
default_charset string
In PHP 5.6 onwards, "UTF-8" is the default value and its value is used as the default character encoding for htmlentities(),
html_entity_decode() and htmlspecialchars() if the encoding parameter
is omitted. The value of default_charset will also be used to set the
default character set for iconv functions if the iconv.input_encoding,
iconv.output_encoding and iconv.internal_encoding configuration
options are unset, and for mbstring functions if the
mbstring.http_input mbstring.http_output mbstring.internal_encoding
configuration option is unset.
All versions of PHP will use this value as the charset within the default Content-Type header sent by PHP if the header isn't overridden
by a call to header().
Setting default_charset to an empty value is not recommended.