htmlspecialchars utf-8 returns empty string
Asked Answered
S

2

14

I'm doing a .php RSS generator and I have a problem trying to get data from my database in this line:

<description><![CDATA[<?=htmlspecialchars(utf8_substr($row['texto'], 0, 100), ENT_QUOTES, 'utf-8') ?>...]]></description>

Some entries show just fine, and others wont return any text... Any idea on what could be wrong?

This is all the code:

<?php

require('php/config.php');
require('php/db.php');
require('php/utils.php');

header("Content-type: application/xml");

$db = new TSQL('SELECT * FROM entradas WHERE estado = 1 ORDER BY fecha DESC LIMIT 20');
if ( $db->executeQuery() ) {

?><?='<?xml version="1.0" encoding="utf-8" ?>' ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Tu Secreto</title>
        <link>http://www.tusecreto.com.ar/</link>
        <description>TuSecreto / No se lo cuentes a nadie, contaselo a todos</description>
        <language>es-ar</language>
        <copyright>TuSecreto (C) 2005-<?php print strftime("%Y", time()); ?></copyright>
        <lastBuildDate><?=strftime("%a, %d %b %Y %H:%M:%S ", $row['fecha']) ?></lastBuildDate>
        <atom:link href="http://www.tusecreto.com.ar/rss.php" rel="self" type="application/rss+xml" />
        <docs>http://www.tusecreto.com.ar/rss.php</docs>
        <generator>TuSecreto RSS Generator v1.0</generator>
        <ttl>10</ttl>
        <? while ($row = $db->getRow(MYSQL_ASSOC)) { ?>
        <item>
            <title><?=($row['sexo'] == MUJER)?'Mujer':'Hombre' ?> | <?=$row['edad'] ?> <?="A\xC3\xB1os" ?></title>
            <description><![CDATA[<?=htmlspecialchars(utf8_substr($row['texto'], 0, 100), ENT_QUOTES, 'utf-8') ?>...]]></description>
            <link>http://www.tusecreto.com.ar/<?=$row['id'] ?></link>
            <guid isPermaLink="true">http://www.tusecreto.com.ar/<?=$row['id'] ?></guid>
            <pubDate><?=strftime("%a, %d %b %Y %H:%M:%S ", $row['fecha']) ?></pubDate>
        </item>
        <?php } ?>
    </channel>
</rss>

This is one result that returns an empty string:

una vez en el colectivo (sentada en el asiento individual) me dormí y cuando doblo me caí en el pasillo re mal! se mataron de la risa todos!! hasta el colectivero! Pasalo y comento con mi Facebook. E.P.

Saberio answered 18/6, 2012 at 18:37 Comment(3)
Give an example of some that are resulting in the return of an empty string.Mario
How is utf8_substr defined?Mme
I've updated the post with all the code and one result... Maybe it's because the accented characters? "dormí". Is spanish...Saberio
M
21

Your code uses htmlspecialchars($string, ENT_QUOTES, 'utf-8'). Quoting from the manpage

If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

Use e.g. htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE, 'utf-8') as a quick workaround.

If invalid input is indeed your problem, of course, you should find out why utf8_substr($row['texto'], 0, 100) does not return a valid UTF-8 string in the first place.

Mcgean answered 27/10, 2012 at 22:27 Comment(2)
(1) The default isn't ENT_QUOTES but ENT_COMPAT. (2) What the doc says about ENT_IGNORE: "Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications." (3) ENT_SUBSTITURE is only available starting from PHP 5.4.0Cablet
Definitely do not ignore this error with ENT_IGNORE as this has security implications, per the docs.Amara
D
0

This is still unresolved, and I recently found a solution to a problem I had similar to this: abnormal characters would make the function print an empty string. Therefore I intend to place my input.

In the flags area, add " | ENT_SUBSTITUTE" and change the encoding type to "cp1252." The ENT_SUBSTITUTE flag will ensure to replace any unrecognized characters instead of creating an empty string. The encoding type "cp1252" is Windows-specific however, and I suggest looking at the other types on the manual page if it does not work (https://www.php.net/manual/en/function.htmlspecialchars.php). (I presumed why this encoding worked for me was because my server is running on Windows IIS)

EDIT: You also have the option to remove the encoding type in XML files, and PHP will work fine with it.

Dolabriform answered 3/2, 2020 at 5:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.