Converting a UTF-8 string into UTF-8 using iconv()
using the //IGNORE
parameter produces a result where invalid UTF-8 characters are dropped.
Therefore, you can detect a broken character by comparing the length of the string before and after the iconv operation. If they differ, they contained a broken character.
Test case (make sure you save the file as UTF-8):
<?php
header("Content-type: text/html; charset=utf-8");
$teststring = "Düsseldorf";
// Deliberately create broken string
// by encoding the original string as ISO-8859-1
$teststring_broken = utf8_decode($teststring);
echo "Broken string: ".$teststring_broken ;
echo "<br>";
$teststring_converted = iconv("UTF-8", "UTF-8//IGNORE", $teststring_broken );
echo $teststring_converted;
echo "<br>";
if (strlen($teststring_converted) != strlen($teststring_broken ))
echo "The string contained an invalid character";
in theory, you could drop //IGNORE
and simply test for a failed (empty) iconv
operation, but there might be other reasons for a iconv to fail than just invalid characters... I don't know. I would use the comparison method.
0x00
approach didn't work out? – Consternate==
(loose) comparison of the � character with 0x00 succeeds for someone, it can't be used for the � character detection since the==
comparison with 0x00 will also pass if compared to""
or"0"
. You must use the===
(strict) comparison of the � character with 0x00 which will most probably fail. – Thermoelectrometer