What does htmlentities with ENT_QUOTES and UTF-8 do?
Asked Answered
T

2

5

I have always used simple htmlentities($_POST['string']); to clean data for any XSS attacks. Recently I have seen people use this:

htmlentities($_POST['string'], ENT_QUOTES, 'UTF-8');

What is the advantage or purpose of using that over just htmlentities().

Also don't know if it is relevant but I use meta UTF-8 always at the top of my pages.

Tumefy answered 1/6, 2013 at 7:19 Comment(0)
A
13

ENT_QUOTES is needed if the data is being substituted into an HTML attribute, e.g.

echo '<input type="text" value="' . htmlentities($string, ENT_QUOTES) . '">";

This ensures that quotes are encoded, so they won't terminate the value="..." attribute prematurely.

UTF-8 is necessary if your page uses UTF-8 charset, because the default is to use ISO-8859-1 encoding. These encodings need to match or the user will see strange characters.

Abnormal answered 1/6, 2013 at 7:26 Comment(5)
The default has changed in PHP 5.4, now the default is UTF-8.Millman
In that case, the benefit of putting the charset in the call is that it will work the sae in all versions of PHP.Abnormal
So from what you said. I should always use this method: htmlentites with ENT_QUOTES and UTF-8. wherever I am echoing the string out. I mean if I use ENT_QUOTES on a string that is not part of an HTML attribute, there should be no problem?Tumefy
Correct. The only harm is that the HTML source will be a little harder to read, since it will be littered with &quot; and &#039;.Abnormal
ENT_QUOTES is NOT needed by default. The default ENT_COMPAT already takes care of double quotation marks. Only if you want to squeeze your output between single quotation marks, then you would need ENT_QUOTES. I would recommend to set ENT_QUOTES to be on the safe side, but this doesn't mean that a simple "htmlentities()" as seen by OP is unsafe.Game
G
1

The reason why people state the character encoding, and the entity quotes, is that

  the encapsulation characters ' and " are encoded (ENT_QUOTES) 

and 'UTF-8' encoding flag expressed as:
   htmlentities($_POST['string'], ENT_QUOTES, $encoding="UTF-8");
or
  htmlentities($_POST['string'], ENT_QUOTES, "UTF-8");

in the whole statement.

The main reason to express the character encoding in the filter is to maintain the frame reference of the input data. If the transmission encoding changed due to either a transmission interference, or malicious transmission packet alterations, the filter fills the missing data with zeros.

Galegalea answered 15/12, 2019 at 0:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.