htmlspecialchars outputting blank
Asked Answered
G

1

6

Using both htmlspecialchars and htmlentities is causing blank outputs from items such as a symbol and even single ' quotes. Obviously, this is absolutely useless, however outputting the data without using html characters results in this symbol for both �. Any reason why this is occuring?

here is the code that is causing the problem:

<p>
<?php 
    echo nl2br(htmlspecialchars($aboutarray[0]['about_us'], ENT_COMPAT, "UTF-8")); 
?>
</p>
Globule answered 26/6, 2012 at 15:11 Comment(16)
Sounds like a charset issue. Are you sure that your data is UTF-8-encoded?Cumings
I may be misunderstanding your problem, but I tried this on ideone.com and it seems to work fine: ideone.com/P298nTentacle
@EmilVikström How do I go about making sure of this?Globule
@EricH yeah it works fine on one of my websites, but for the other with identical code it outputs incorrectly.Globule
@Globule You might could try utf8_encode(): php.net/manual/en/function.utf8-encode.phpTentacle
Where is $aboutarray[0]['about_us'] coming from?Strow
This will give you the byte sequence of the string: for ($i = 0; $i < strlen($string); $i++) printf('%d ', ord($string[$i]));Cumings
@EricH using utf8_encode worked, now im confused why this is necessary on one site, but on the other the text outputs properly from the get-go?Globule
@Strow i didnt include the query for the array, but the value is text and I have confirmed output without using htmlspecialchars or htmlentitiesGlobule
The collation in the database is utf8_general_ciGlobule
If utf8_encode worked, that means the data was actually encoded in Latin-1. You may want to read this: kunststube.net/frontbackStrow
What's the connection charset (set during connection to the database)?Cumings
The data i took was copy pasted from an old site (migrating the site, not stealing anything). Would that be a possible reason that the text from the site was encoded in Latin?Globule
@EmilVikström in the header of the site i have the meta tag <meta http-equiv="Content-Type" content="text/html; charset=utf-8">, as for the database connection I am using ADODB which doesnt have any issues on the first site. I havent specified a encoding as far as i know.Globule
upon further inspection, utf8_encode is just removing the trademark symbolGlobule
@Globule Check out mb_detect_encoding(): php.net/manual/en/function.mb-detect-encoding.php or mb_check_encoding(): php.net/manual/en/function.mb-check-encoding.php. Those may be of assistance in tracking down the issue.Tentacle
S
14

That string is not encoded in valid UTF-8 encoding. It could be in another encoding like UTF-16 or perhaps it just contains some binary junk that doesn't correspond to any format.

The bottom line is that, since you specified "UTF-8" as the encoding type parameter of htmlspecialchars(), it will return an empty string if the string does not comply with "UTF-8". It states this in the PHP manual.

A simple fix is to use the substitute or ignore flag. Change:

htmlspecialchars($aboutarray[0]['about_us'], ENT_COMPAT, "UTF-8")

To:

htmlspecialchars($aboutarray[0]['about_us'], ENT_COMPAT|ENT_SUBSTITUTE, "UTF-8")

Or:

htmlspecialchars($aboutarray[0]['about_us'], ENT_COMPAT|ENT_IGNORE, "UTF-8")

Note: ENT_IGNORE removes the non-compliant bytes. This could cause a security issue. It's better to truly understand the contents of your string and how it's being encoded. Correct the source of the problem rather than use the simple ENT_IGNORE fix.

You should ask yourself why your string is not encoded in UTF-8... it should be, but it's not.

I happen to have just encountered this problem as well; you can read details on why an empty string is being returned here.

Savannahsavant answered 29/7, 2012 at 3:36 Comment(1)
P.S. I would also suggest changing ENT_COMPAT to ENT_QUOTES, but as always, I suppose that depends on your specific scenario.Savannahsavant

© 2022 - 2024 — McMap. All rights reserved.