I am confused about the behavior of utf8_decode() and just want a little clarification. I hope that's ok.
Here's a simple HTML form that I'm using to capture some text and save it to my MySQL database (which uses the utf8_general_ci collation):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form action="update.php" method="post" accept-charset="utf-8">
<p>
Title: <input type="text" name="title" id="title" accept-charset="utf-8" size="75" value="" />
</p>
<p>
<input type="submit" name="submit" value="Submit" />
</p>
</form>
</body>
</html>
As you can see I've got this coded up with charset=utf8 in the appropriate places. We accept text that includes diacritics (eg., ñ, ó, etc.). In the end, we run a little script on all text input to check for diacritics and change them to HTML entities (eg., ñ becomes ñ).
When input is received by my script, I first have to do utf8_decode($input) and then run my little script to check for and change diacritics as needed. Everything works fine. I'm curious as to why I have to run the decode on this input. I understand that utf8_decode converts a string encoded in UTF-8 to ISO-8859-1. I want to make sure - even though everything works fine (or so I think) - that I'm not doing something screwy that will catch up to me later. For instance, that I'm sending ISO-8859-1 encoded characters to be stored in my database that is set up to store/serve UTF-8 characters. Should I do something like run utf8_encode() on the string that my diacritics-to-entities script returns? Eg:
$string = utf8_decode($string);
$search = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ñ,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ,Å,å,Æ,æ,ß,Þ,þ,ç,Ç,Œ,œ,Ð,ð,Ø,ø,§,Š,š,µ,¢,£,¥,€,¤,ƒ,¡,¿");
$replace = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ntilde;,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ,Å,å,Æ,æ,ß,Þ,þ,ç,Ç,Œ,œ,Ð,ð,Ø,ø,§,Š,š,µ¢,£,¥,€,¤,ƒ,¡,¿");
$new_input = str_replace($search, $replace, $string);
return utf8_encode($new_input); // right now i just return $new_input.
Appreciate any insight anyone has to offer about this.